[prev in list] [next in list] [prev in thread] [next in thread] 

List:       solr-user
Subject:    Solr+HDFS
From:       Joseph Obernberger <joseph.obernberger () gmail ! com>
Date:       2016-01-29 5:40:59
Message-ID: CADRon307A_MoJMduUzoG7a05Yh0Li+FNnVnMx9OJLO6k+VeksA () mail ! gmail ! com
[Download RAW message or body]


Hi All - we're using Apache Solr Cloud 5.2.1, with an HDFS system that is
86% full.  Some of the datanodes in the HDFS cluster are more close to
being full than other nodes.  We're getting messages about "Error adding
log" from the index process, which I **think** is related to datanodes
being full.
Is that the case?  Even though HDFS still has room available?

Upon restart of Solr Cloud, we see messages such as:

ERROR - 2016-01-28 22:15:27.594; [   UNCLASS]
org.apache.solr.common.SolrException; Failure to open existing log file
(non fatal)
hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282501:org.apache.solr.common.SolrException:
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 failed to create file
/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282501 for
DFSClient_NONMAPREDUCE_1371064795_29 for client 172.16.100.218 because
current leaseholder is trying to recreate file.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3075)


        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905)


        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186)


        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149)


        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611)


        at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124)


        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416)


        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)


        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)


        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

        at
org.apache.solr.update.HdfsTransactionLog.<init>(HdfsTransactionLog.java:131)

        at
org.apache.solr.update.HdfsUpdateLog.init(HdfsUpdateLog.java:193)
        at
org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:136)
        at
org.apache.solr.update.UpdateHandler.<init>(UpdateHandler.java:94)
        at
org.apache.solr.update.DirectUpdateHandler2.<init>(DirectUpdateHandler2.java:99)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)


        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)


        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:573)
        at
org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:635)
        at
org.apache.solr.core.SolrCore.initUpdateHandler(SolrCore.java:928)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:786)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:658)
        at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:637)
        at
org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:381)
        at
org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:375)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148)


        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 failed to create file
/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282501 for
DFSClient_NONMAPREDUCE_1371064795_29 for client 172.16.100.218 because
current leaseholder is trying to recreate file.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3075)


        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2905)


        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:3186)


        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:3149)


        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:611)


        at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.append(AuthorizationProviderProxyClientProtocol.java:124)


        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:416)


        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)


        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)


        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

        at org.apache.hadoop.ipc.Client.call(Client.java:1472)
        at org.apache.hadoop.ipc.Client.call(Client.java:1403)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)

        at com.sun.proxy.$Proxy11.append(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:313)


        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)


        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)


        at com.sun.proxy.$Proxy12.append(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1842)
        at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1878)
        at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1871)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:329)

        at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:325)

        at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

        at
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:325)

        at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1172)
        at
org.apache.solr.update.HdfsTransactionLog.<init>(HdfsTransactionLog.java:94)

        ... 21 more

INFO  - 2016-01-28 22:15:27.639; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; Recovering lease on dfs file
hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933


INFO  - 2016-01-28 22:15:27.698; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=0 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 58ms
INFO  - 2016-01-28 22:15:31.821; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=1 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 4181ms
INFO  - 2016-01-28 22:16:32.971; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=2 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 65331ms
INFO  - 2016-01-28 22:17:34.638; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=3 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 126998ms
INFO  - 2016-01-28 22:18:35.764; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=4 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 188124ms
INFO  - 2016-01-28 22:19:37.114; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=5 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 249474ms
INFO  - 2016-01-28 22:20:38.629; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=6 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 310989ms
INFO  - 2016-01-28 22:21:39.751; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=7 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 372111ms
INFO  - 2016-01-28 22:22:40.854; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=8 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 433214ms
INFO  - 2016-01-28 22:23:41.981; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=9 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 494341ms
INFO  - 2016-01-28 22:24:43.088; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=10 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 555448ms
INFO  - 2016-01-28 22:25:44.808; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=11 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 617168ms
INFO  - 2016-01-28 22:26:45.934; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=12 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 678294ms
INFO  - 2016-01-28 22:27:47.036; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=13 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 739396ms
INFO  - 2016-01-28 22:28:48.504; [   UNCLASS]
org.apache.solr.util.FSHDFSUtils; recoverLease=false, attempt=14 on
file=hdfs://nameservice1:8020/solr5.2/UNCLASS/core_node14/data/tlog/tlog.0000000000000282933
 after 800864ms

Some shards in the cluster can take hours to come back up.  Any ideas? It
appears to wait 900 seconds for each of the tlog files.  When there are 60+
files - this takes a long time!
Thank you!



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic