[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-commits
Subject: =?utf-8?q?=5BHadoop_Wiki=5D_Update_of_=22ServerNotAvailable=22_by_SteveLo?=
From: Apache Wiki <wikidiffs () apache ! org>
Date: 2011-06-30 10:50:40
Message-ID: 20110630105040.86796.69755 () eos ! apache ! org
[Download RAW message or body]
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change \
notification.
The "ServerNotAvailable" page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/ServerNotAvailable
Comment:
new page on understanding the server not available error
New page:
= Server Not Available Yet =
This can appear in the logs of a DataNode
{{{
2011-06-30 11:30:40,403 INFO org.apache.hadoop.ipc.Client: Retrying connect to \
server: namenode/10.8.1.2:54310. Already tried 0 time(s). 2011-06-30 11:30:41,404 \
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: \
namenode/10.8.1.2:54310. Already tried 1 time(s). 2011-06-30 11:30:42,404 INFO \
org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. \
Already tried 2 time(s). 2011-06-30 11:30:43,405 INFO org.apache.hadoop.ipc.Client: \
Retrying connect to server: namenode/10.8.1.2:54310. Already tried 3 time(s). \
2011-06-30 11:30:44,405 INFO org.apache.hadoop.ipc.Client: Retrying connect to \
server: namenode/10.8.1.2:54310. Already tried 4 time(s). 2011-06-30 11:30:45,406 \
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: \
namenode/10.8.1.2:54310. Already tried 5 time(s). 2011-06-30 11:30:46,407 INFO \
org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. \
Already tried 6 time(s). 2011-06-30 11:30:47,407 INFO org.apache.hadoop.ipc.Client: \
Retrying connect to server: namenode/10.8.1.2:54310. Already tried 7 time(s). \
2011-06-30 11:30:48,408 INFO org.apache.hadoop.ipc.Client: Retrying connect to \
server: namenode/10.8.1.2:54310. Already tried 8 time(s). 2011-06-30 11:30:49,409 \
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: \
namenode/10.8.1.2:54310. Already tried 9 time(s). 2011-06-30 11:30:49,410 INFO \
org.apache.hadoop.ipc.RPC: Server at namenode/10.8.1.2:54310 not available yet, \
Zzzzz... 2011-06-30 11:30:51,411 INFO org.apache.hadoop.ipc.Client: Retrying connect \
to server: namenode/10.8.1.2:54310. Already tried 0 time(s). 2011-06-30 11:30:52,412 \
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: \
namenode/10.8.1.2:54310. Already tried 1 time(s). 2011-06-30 11:30:53,412 INFO \
org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. \
Already tried 2 time(s). 2011-06-30 11:30:54,413 INFO org.apache.hadoop.ipc.Client: \
Retrying connect to server: namenode/10.8.1.2:54310. Already tried 3 time(s). \
2011-06-30 11:30:55,414 INFO org.apache.hadoop.ipc.Client: Retrying connect to \
server: namenode/10.8.1.2:54310. Already tried 4 time(s). 2011-06-30 11:30:56,414 \
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: \
namenode/10.8.1.2:54310. Already tried 5 time(s). 2011-06-30 11:30:57,415 INFO \
org.apache.hadoop.ipc.Client: Retrying connect to server: namenode/10.8.1.2:54310. \
Already tried 6 time(s). 2011-06-30 11:30:58,416 INFO org.apache.hadoop.ipc.Client: \
Retrying connect to server: namenode/10.8.1.2:54310. Already tried 7 time(s). \
2011-06-30 11:30:59,416 INFO org.apache.hadoop.ipc.Client: Retrying connect to \
server: namenode/10.8.1.2:54310. Already tried 8 time(s). 2011-06-30 11:31:00,417 \
INFO org.apache.hadoop.ipc.Client: Retrying connect to server: \
namenode/10.8.1.2:54310. Already tried 9 time(s). 2011-06-30 11:31:00,418 INFO \
org.apache.hadoop.ipc.RPC: Server at namenode/10.8.1.2:54310 not available yet, \
Zzzzz... }}}
What's happening here is that the DataNode cannot connect to the NameNode. Rather \
than fail, it assumes that the NameNode is temporarily offline -it hasn't started or \
is being restarted. The DataNodes will happily wait for the NameNode to come back up, \
and as soon as it does, report in. After trying repeatedly every seconds the client \
will back off for couple of seconds, then try again.
This process of retrying and backing off is a key part of how an HDFS cluster handles \
the temporary outage of a NameNode. It works well provided the network is set up and \
running correctly. It can be triggered by other cluster setup problems, which anyone \
setting up a Hadoop cluster is likely to encounter.
1. The namenode hasn't been started yet. Fix: start the NameNode.
2. The `fs.default.name` property in `core-site.xml` doesn't point to the correct \
hostname for the NameNode, and the DataNodes are trying to connect to the wrong \
server. Look at the server name in the log and verify it is valid. 3. The port in \
the `fs.default.name` property is wrong. Verify the NameNode is listening at that \
port; if not correct the site settings. 4. The client can't resolve the hostname, or \
it is resolving to the wrong address. Verify that IP address in the logs. 5. \
Connection problems. Look at the network connectivity options in the TroubleShooting \
page.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic