[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-user
Subject: Re: Datanode cannot start
From: Gurmukh Singh <gurmukh.dhillon () yahoo ! com ! INVALID>
Date: 2020-04-25 4:24:27
Message-ID: c7a8267c-5216-5bfa-27b6-4480fed7a56d () yahoo ! com
[Download RAW message or body]
Hi Wenqi,
Which version of hadoop are you using? (My initial guess is 3.X, as )
Can you see anything in the datanode logs on the lines of
"org.apache.hadoop.hdfs.server.common.InconsistentFSStateException"
Although, you mention that it is new cluster, make sure the ClusterID
matches. --> "attempt to load an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027"
On 22/4/20 9:50 pm, Wenqi Ma wrote:
> Dear Hadoop community,
>
> A strange problem happened in our fresh installed hdfs cluster. The
> datanode cannot start because two threads tried to load storage, the
> latter would fail and lead a FATAL error.
> As the following log shown, the blue text
> means DataStorage::addStorageLocations is called, the red text are I
> added to find the thread details.
> I compared the log with normal datanode's in another cluster, and
> there is only one line (blue text).
>
> Any suggestion would be appreciated, thanks!
>
> 2020-04-22 19:32:25,666 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request
> received for nameservices: nameservice1
> 2020-04-22 19:32:25,718 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
> BPOfferServices for nameservices: nameservice1
> 2020-04-22 19:32:25,726 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
> <registering> (Datanode Uuid unassigned) service to
> node238/172.16.2.238:8020 <http://172.16.2.238:8020> starting to offer
> service
> 2020-04-22 19:32:25,727 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
> <registering> (Datanode Uuid unassigned) service to
> node239/172.16.2.239:8020 <http://172.16.2.239:8020> starting to offer
> service
> 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 50020: starting
> 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2020-04-22 19:32:25,885 INFO
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 30
> 2020-04-22 19:32:25,885 INFO
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread:
> Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]
> heartbeating to node239/172.16.2.239:8020
> <http://172.16.2.239:8020>,5,main]
> 2020-04-22 19:32:25,885 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to
> upgrade data directories
> (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
> 2020-04-22 19:32:25,890 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Lock on
> /vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename 35@node238
> 2020-04-22 19:32:25,918 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
> directories for bpid BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,918 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for
> /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,920 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage:
> nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;ci \
> d=hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe
> 2020-04-22 19:32:25,931 INFO
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 29
> 2020-04-22 19:32:25,931 INFO
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread:
> Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]
> heartbeating to node238/172.16.2.238:8020
> <http://172.16.2.238:8020>,5,main]
> 2020-04-22 19:32:25,931 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to
> upgrade data directories
> (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
> 2020-04-22 19:32:25,932 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Storage directory
> [DISK]file:/vdir/mnt/disk2/hadoop/data/ has already been used.
> 2020-04-22 19:32:25,950 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
> directories for bpid BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,950 WARN
> org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze
> storage directories for block pool
> BP-1558632922-172.16.2.238-1587535931027
> java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead:
> attempt to load an used block storage:
> /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408) \
>
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397)
>
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362)
>
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
>
> at java.lang.Thread.run(Thread.java:748)
>
> --
> Best Regards!
> Wenqi
>
[Attachment #3 (text/html)]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Wenqi,</p>
<p>Which version of hadoop are you using? (My initial guess is 3.X,
as )<br>
</p>
<p>Can you see anything in the datanode logs on the lines of
"org.apache.hadoop.hdfs.server.common.InconsistentFSStateException"</p>
<p>Although, you mention that it is new cluster, make sure the
ClusterID matches. --> "attempt to load an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027"<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 22/4/20 9:50 pm, Wenqi Ma wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAEny681VkD6VydPEp7a-cXxvUNEHTtw-kvNt_xDDEuFEhfMiOQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Dear Hadoop community,<br clear="all">
<div><br>
</div>
<div>A strange problem happened in our fresh installed hdfs
cluster. The datanode cannot start because two threads tried
to load storage, the latter would fail and lead a FATAL error.</div>
<div>As the following log shown, the blue text
means DataStorage::addStorageLocations is called, the red text
are I added to find the thread details.</div>
<div>I compared the log with normal datanode's in another
cluster, and there is only one line (blue text). </div>
<div><br>
</div>
<div>Any suggestion would be appreciated, thanks!</div>
<div><br>
</div>
<div>2020-04-22 19:32:25,666 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh
request received for nameservices: nameservice1<br>
2020-04-22 19:32:25,718 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
BPOfferServices for nameservices: nameservice1<br>
2020-04-22 19:32:25,726 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
<registering> (Datanode Uuid unassigned) service to
node238/<a href="http://172.16.2.238:8020"
moz-do-not-send="true">172.16.2.238:8020</a> starting to
offer service<br>
2020-04-22 19:32:25,727 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
<registering> (Datanode Uuid unassigned) service to
node239/<a href="http://172.16.2.239:8020"
moz-do-not-send="true">172.16.2.239:8020</a> starting to
offer service<br>
2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC
Server listener on 50020: starting <br>
2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC
Server Responder: starting <br>
2020-04-22 19:32:25,885 INFO
org.apache.hadoop.hdfs.server.common.Storage: kkk current
thread ID: 30<br>
2020-04-22 19:32:25,885 INFO
org.apache.hadoop.hdfs.server.common.Storage: kkk <font
color="#ff0000">current thread: Thread[DataNode:
[[[DISK]<a class="moz-txt-link-freetext" \
href="file:/vdir/mnt/disk2/hadoop/data/">file:/vdir/mnt/disk2/hadoop/data/</a>]] \
heartbeating to node239/<a href="http://172.16.2.239:8020"
moz-do-not-send="true">172.16.2.239:8020</a>,5,main] </font>
<br>
2020-04-22 19:32:25,885 INFO
org.apache.hadoop.hdfs.server.common.Storage: <font
color="#0000ff">Using 1 threads to upgrade data directories
(dfs.datanode.parallel.volumes.load.threads.num=1,
dataDirs=1)</font><br>
2020-04-22 19:32:25,890 INFO
org.apache.hadoop.hdfs.server.common.Storage: Lock on
/vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename
35@node238<br>
2020-04-22 19:32:25,918 INFO
org.apache.hadoop.hdfs.server.common.Storage: Analyzing
storage directories for bpid
BP-1558632922-172.16.2.238-1587535931027<br>
2020-04-22 19:32:25,918 INFO
org.apache.hadoop.hdfs.server.common.Storage: Locking is
disabled for
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027<br>
2020-04-22 19:32:25,920 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up
storage:
nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;cid= \
hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe<br>
2020-04-22 19:32:25,931 INFO
org.apache.hadoop.hdfs.server.common.Storage: kkk current
thread ID: 29<br>
2020-04-22 19:32:25,931 INFO
org.apache.hadoop.hdfs.server.common.Storage: kkk <font
color="#ff0000">current thread: Thread[DataNode:
[[[DISK]<a class="moz-txt-link-freetext" \
href="file:/vdir/mnt/disk2/hadoop/data/">file:/vdir/mnt/disk2/hadoop/data/</a>]] \
heartbeating to node238/<a href="http://172.16.2.238:8020"
moz-do-not-send="true">172.16.2.238:8020</a>,5,main]</font>
<br>
2020-04-22 19:32:25,931 INFO
org.apache.hadoop.hdfs.server.common.Storage: <font
color="#0000ff">Using 1 threads to upgrade data directories
(dfs.datanode.parallel.volumes.load.threads.num=1,
dataDirs=1)</font><br>
2020-04-22 19:32:25,932 INFO
org.apache.hadoop.hdfs.server.common.Storage: Storage
directory [DISK]<a class="moz-txt-link-freetext" \
href="file:/vdir/mnt/disk2/hadoop/data/">file:/vdir/mnt/disk2/hadoop/data/</a> has \
already been used. <br>
2020-04-22 19:32:25,950 INFO
org.apache.hadoop.hdfs.server.common.Storage: Analyzing
storage directories for bpid
BP-1558632922-172.16.2.238-1587535931027<br>
2020-04-22 19:32:25,950 WARN
org.apache.hadoop.hdfs.server.common.Storage: Failed to
analyze storage directories for block pool
BP-1558632922-172.16.2.238-1587535931027<br>
java.io.IOException:
BlockPoolSliceStorage.recoverTransitionRead: attempt to load
an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027<br>
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)<br>
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)<br>
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)<br>
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408)
<br>
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)<br>
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397)
<br>
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362)
<br>
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)<br>
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)<br>
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
<br>
at java.lang.Thread.run(Thread.java:748)<br>
</div>
<div><br>
</div>
-- <br>
<div dir="ltr" class="gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div>Best Regards!</div>
<div>Wenqi</div>
<div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic