[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-user
Subject:    Re: Datanode cannot start
From:       Gurmukh Singh <gurmukh.dhillon () yahoo ! com ! INVALID>
Date:       2020-04-25 4:24:27
Message-ID: c7a8267c-5216-5bfa-27b6-4480fed7a56d () yahoo ! com
[Download RAW message or body]

Hi Wenqi,

Which version of hadoop are you using? (My initial guess is 3.X, as )

Can you see anything in the datanode logs on the lines of 
"org.apache.hadoop.hdfs.server.common.InconsistentFSStateException"

Although, you mention that it is new cluster, make sure the ClusterID 
matches. --> "attempt to load an used block storage: 
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027"


On 22/4/20 9:50 pm, Wenqi Ma wrote:
> Dear Hadoop community,
> 
> A strange problem happened in our fresh installed hdfs cluster. The 
> datanode cannot start because two threads tried to load storage, the 
> latter would fail and lead a FATAL error.
> As the following log shown, the blue text 
> means DataStorage::addStorageLocations is called, the red text are I 
> added to find the thread details.
> I compared the log with normal datanode's in another cluster, and 
> there is only one line (blue text).
> 
> Any suggestion would be appreciated, thanks!
> 
> 2020-04-22 19:32:25,666 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request 
> received for nameservices: nameservice1
> 2020-04-22 19:32:25,718 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting 
> BPOfferServices for nameservices: nameservice1
> 2020-04-22 19:32:25,726 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool 
> <registering> (Datanode Uuid unassigned) service to 
> node238/172.16.2.238:8020 <http://172.16.2.238:8020> starting to offer 
> service
> 2020-04-22 19:32:25,727 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool 
> <registering> (Datanode Uuid unassigned) service to 
> node239/172.16.2.239:8020 <http://172.16.2.239:8020> starting to offer 
> service
> 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 50020: starting
> 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2020-04-22 19:32:25,885 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 30
> 2020-04-22 19:32:25,885 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread: 
> Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]] 
> heartbeating to node239/172.16.2.239:8020 
> <http://172.16.2.239:8020>,5,main]
> 2020-04-22 19:32:25,885 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to 
> upgrade data directories 
> (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
> 2020-04-22 19:32:25,890 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Lock on 
> /vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename 35@node238
> 2020-04-22 19:32:25,918 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage 
> directories for bpid BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,918 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for 
> /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,920 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: 
> nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;ci \
> d=hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe
>  2020-04-22 19:32:25,931 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 29
> 2020-04-22 19:32:25,931 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread: 
> Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]] 
> heartbeating to node238/172.16.2.238:8020 
> <http://172.16.2.238:8020>,5,main]
> 2020-04-22 19:32:25,931 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to 
> upgrade data directories 
> (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
> 2020-04-22 19:32:25,932 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Storage directory 
> [DISK]file:/vdir/mnt/disk2/hadoop/data/ has already been used.
> 2020-04-22 19:32:25,950 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage 
> directories for bpid BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,950 WARN 
> org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze 
> storage directories for block pool 
> BP-1558632922-172.16.2.238-1587535931027
> java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: 
> attempt to load an used block storage: 
> /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408) \
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397) 
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362) 
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) 
> 
> at java.lang.Thread.run(Thread.java:748)
> 
> -- 
> Best Regards!
> Wenqi
> 


[Attachment #3 (text/html)]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi Wenqi,</p>
    <p>Which version of hadoop are you using? (My initial guess is 3.X,
      as )<br>
    </p>
    <p>Can you see anything in the datanode logs on the lines of 
      "org.apache.hadoop.hdfs.server.common.InconsistentFSStateException"</p>
    <p>Although, you mention that it is new cluster, make sure the
      ClusterID matches. --&gt; "attempt to load an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027"<br>
    </p>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 22/4/20 9:50 pm, Wenqi Ma wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAEny681VkD6VydPEp7a-cXxvUNEHTtw-kvNt_xDDEuFEhfMiOQ@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">Dear Hadoop community,<br clear="all">
        <div><br>
        </div>
        <div>A strange problem happened in our fresh installed hdfs
          cluster. The datanode cannot start because two threads tried
          to load storage, the latter would fail and lead a FATAL error.</div>
        <div>As the following log shown, the blue text
          means DataStorage::addStorageLocations is called, the red text
          are I added to find the thread details.</div>
        <div>I compared the log with normal datanode's in another
          cluster, and there is only one line (blue text). </div>
        <div><br>
        </div>
        <div>Any suggestion would be appreciated, thanks!</div>
        <div><br>
        </div>
        <div>2020-04-22 19:32:25,666 INFO
          org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh
          request received for nameservices: nameservice1<br>
          2020-04-22 19:32:25,718 INFO
          org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
          BPOfferServices for nameservices: nameservice1<br>
          2020-04-22 19:32:25,726 INFO
          org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
          &lt;registering&gt; (Datanode Uuid unassigned) service to
          node238/<a href="http://172.16.2.238:8020"
            moz-do-not-send="true">172.16.2.238:8020</a> starting to
          offer service<br>
          2020-04-22 19:32:25,727 INFO
          org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
          &lt;registering&gt; (Datanode Uuid unassigned) service to
          node239/<a href="http://172.16.2.239:8020"
            moz-do-not-send="true">172.16.2.239:8020</a> starting to
          offer service<br>
          2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC
          Server listener on 50020: starting   <br>
          2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC
          Server Responder: starting           <br>
          2020-04-22 19:32:25,885 INFO
          org.apache.hadoop.hdfs.server.common.Storage: kkk current
          thread ID: 30<br>
          2020-04-22 19:32:25,885 INFO
          org.apache.hadoop.hdfs.server.common.Storage: kkk <font
            color="#ff0000">current thread: Thread[DataNode:
            [[[DISK]<a class="moz-txt-link-freetext" \
href="file:/vdir/mnt/disk2/hadoop/data/">file:/vdir/mnt/disk2/hadoop/data/</a>]]  \
heartbeating to  node239/<a href="http://172.16.2.239:8020"
              moz-do-not-send="true">172.16.2.239:8020</a>,5,main] </font>
                                                                   <br>
          2020-04-22 19:32:25,885 INFO
          org.apache.hadoop.hdfs.server.common.Storage: <font
            color="#0000ff">Using 1 threads to upgrade data directories
            (dfs.datanode.parallel.volumes.load.threads.num=1,
            dataDirs=1)</font><br>
          2020-04-22 19:32:25,890 INFO
          org.apache.hadoop.hdfs.server.common.Storage: Lock on
          /vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename
          35@node238<br>
          2020-04-22 19:32:25,918 INFO
          org.apache.hadoop.hdfs.server.common.Storage: Analyzing
          storage directories for bpid
          BP-1558632922-172.16.2.238-1587535931027<br>
          2020-04-22 19:32:25,918 INFO
          org.apache.hadoop.hdfs.server.common.Storage: Locking is
          disabled for
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027<br>
          2020-04-22 19:32:25,920 INFO
          org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up
          storage:
nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;cid= \
hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe<br>
  2020-04-22 19:32:25,931 INFO
          org.apache.hadoop.hdfs.server.common.Storage: kkk current
          thread ID: 29<br>
          2020-04-22 19:32:25,931 INFO
          org.apache.hadoop.hdfs.server.common.Storage: kkk <font
            color="#ff0000">current thread: Thread[DataNode:
            [[[DISK]<a class="moz-txt-link-freetext" \
href="file:/vdir/mnt/disk2/hadoop/data/">file:/vdir/mnt/disk2/hadoop/data/</a>]]  \
heartbeating to  node238/<a href="http://172.16.2.238:8020"
              moz-do-not-send="true">172.16.2.238:8020</a>,5,main]</font> 
                                                       <br>
          2020-04-22 19:32:25,931 INFO
          org.apache.hadoop.hdfs.server.common.Storage: <font
            color="#0000ff">Using 1 threads to upgrade data directories
            (dfs.datanode.parallel.volumes.load.threads.num=1,
            dataDirs=1)</font><br>
          2020-04-22 19:32:25,932 INFO
          org.apache.hadoop.hdfs.server.common.Storage: Storage
          directory [DISK]<a class="moz-txt-link-freetext" \
href="file:/vdir/mnt/disk2/hadoop/data/">file:/vdir/mnt/disk2/hadoop/data/</a> has \
already  been used. <br>
          2020-04-22 19:32:25,950 INFO
          org.apache.hadoop.hdfs.server.common.Storage: Analyzing
          storage directories for bpid
          BP-1558632922-172.16.2.238-1587535931027<br>
          2020-04-22 19:32:25,950 WARN
          org.apache.hadoop.hdfs.server.common.Storage: Failed to
          analyze storage directories for block pool
          BP-1558632922-172.16.2.238-1587535931027<br>
          java.io.IOException:
          BlockPoolSliceStorage.recoverTransitionRead: attempt to load
          an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027<br>
              at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)<br>
  at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)<br>
  at
org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)<br>
  at
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408)
  <br>
              at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)<br>
  at
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397)
                       <br>
              at
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362)
                     <br>
              at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)<br>
  at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)<br>
  at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
                    <br>
              at java.lang.Thread.run(Thread.java:748)<br>
        </div>
        <div><br>
        </div>
        -- <br>
        <div dir="ltr" class="gmail_signature"
          data-smartmail="gmail_signature">
          <div dir="ltr">
            <div>
              <div>Best Regards!</div>
              <div>Wenqi</div>
              <div>
                <div><br>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
  </body>
</html>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic