[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] help
From:       Colin_Faber () xyratex ! com (Colin Faber)
Date:       2011-09-30 14:46:48
Message-ID: CA68FDCAE785124C81F07EC64FA6302F01C67485 () XYUS-EX21 ! xyus ! xyratex ! com
[Download RAW message or body]

Hi,

Looks like connection timeout, likely temporary as it appears to have 
reconnected and recovered without any problems.

What other issue are you experiencing?

-cf


On 09/29/2011 10:39 PM, Ashok nulguda wrote:
> Dear All,
> 
> I am having lustre error on my HPC as given below.Please any one can 
> help me to resolve this problem.
> Thanks in Advance.
> Sep 30 08:40:23 service0 kernel: [343138.837222] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1 previous 
> similar message
> Sep 30 08:40:23 service0 kernel: [343138.837233] Lustre: 
> lustre-OST0008-osc-ffff880b272cf800: Connection to service 
> lustre-OST0008 via nid 10.148.0.106 at o2ib was lost; in progress 
> operations using this service will wait for recovery to complete.
> Sep 30 08:40:24 service0 kernel: [343139.837260] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
> x1380984193067288 sent from lustre-OST0006-osc-ffff880b272cf800 to NID 
> 10.148.0.106 at o2ib 7s ago has timed out (7s prior to deadline).
> Sep 30 08:40:24 service0 kernel: [343139.837263]   
> req at ffff880a5f800c00 x1380984193067288/t0 
> o3->lustre-OST0006_UUID at 10.148.0.106@o2ib:6/4 lens 448/592 e 0 to 1 dl 
> 1317352224 ref 2 fl Rpc:/0/0 rc 0/0
> Sep 30 08:40:24 service0 kernel: [343139.837269] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 38 previous 
> similar messages
> Sep 30 08:40:24 service0 kernel: [343140.129284] LustreError: 
> 9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from 
> cancel RPC: canceling anyway
> Sep 30 08:40:24 service0 kernel: [343140.129290] LustreError: 
> 9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1 previous 
> similar message
> Sep 30 08:40:24 service0 kernel: [343140.129295] LustreError: 
> 9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) 
> ldlm_cli_cancel_list: -11
> Sep 30 08:40:24 service0 kernel: [343140.129299] LustreError: 
> 9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1 previous 
> similar message
> Sep 30 08:40:25 service0 kernel: [343140.837308] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
> x1380984193067299 sent from lustre-OST0010-osc-ffff880b272cf800 to NID 
> 10.148.0.106 at o2ib 7s ago has timed out (7s prior to deadline).
> Sep 30 08:40:25 service0 kernel: [343140.837311]   
> req at ffff880a557c4400 x1380984193067299/t0 
> o3->lustre-OST0010_UUID at 10.148.0.106@o2ib:6/4 lens 448/592 e 0 to 1 dl 
> 1317352225 ref 2 fl Rpc:/0/0 rc 0/0
> Sep 30 08:40:25 service0 kernel: [343140.837316] Lustre: 
> 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 4 previous 
> similar messages
> Sep 30 08:40:26 service0 kernel: [343141.245365] LustreError: 
> 30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from 
> cancel RPC: canceling anyway
> Sep 30 08:40:26 service0 kernel: [343141.245371] LustreError: 
> 22729:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) 
> ldlm_cli_cancel_list: -11
> Sep 30 08:40:26 service0 kernel: [343141.245378] LustreError: 
> 30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1 previous 
> similar message
> Sep 30 08:40:33 service0 kernel: [343148.245683] Lustre: 
> 22725:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
> x1380984193067302 sent from lustre-OST0004-osc-ffff880b272cf800 to NID 
> 10.148.0.106 at o2ib 14s ago has timed out (14s prior to deadline).
> Sep 30 08:40:33 service0 kernel: [343148.245686]   
> req at ffff8805c879e800 x1380984193067302/t0 
> o103->lustre-OST0004_UUID at 10.148.0.106@o2ib:17/18 lens 296/384 e 0 to 
> 1 dl 1317352233 ref 1 fl Rpc:N/0/0 rc 0/0
> Sep 30 08:40:33 service0 kernel: [343148.245692] Lustre: 
> 22725:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 2 previous 
> similar messages
> Sep 30 08:40:33 service0 kernel: [343148.245708] LustreError: 
> 22725:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from 
> cancel RPC: canceling anyway
> Sep 30 08:40:33 service0 kernel: [343148.245714] LustreError: 
> 22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) 
> ldlm_cli_cancel_list: -11
> Sep 30 08:40:33 service0 kernel: [343148.245717] LustreError: 
> 22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1 
> previous similar message
> Sep 30 08:40:36 service0 kernel: [343151.548005] LustreError: 11-0: an 
> error occurred while communicating with 10.148.0.106 at o2ib. The 
> ost_connect operation failed with -16
> Sep 30 08:40:36 service0 kernel: [343151.548008] LustreError: Skipped 
> 1 previous similar message
> Sep 30 08:40:36 service0 kernel: [343151.548024] LustreError: 167-0: 
> This client was evicted by lustre-OST000b; in progress operations 
> using this service will fail.
> Sep 30 08:40:36 service0 kernel: [343151.548250] LustreError: 
> 30452:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5
> Sep 30 08:40:36 service0 kernel: [343151.550210] LustreError: 
> 8300:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  
> req at ffff88049528c400 x1380984193067406/t0 
> o3->lustre-OST000b_UUID at 10.148.0.106@o2ib:6/4 lens 448/592 e 0 to 1 dl 
> 0 ref 2 fl Rpc:/0/0 rc 0/0
> Sep 30 08:40:36 service0 kernel: [343151.594742] Lustre: 
> lustre-OST0000-osc-ffff880b272cf800: Connection restored to service 
> lustre-OST0000 using nid 10.148.0.106 at o2ib.
> Sep 30 08:40:36 service0 kernel: [343151.837203] Lustre: 
> lustre-OST0006-osc-ffff880b272cf800: Connection restored to service 
> lustre-OST0006 using nid 10.148.0.106 at o2ib.
> Sep 30 08:40:37 service0 kernel: [343152.842631] Lustre: 
> lustre-OST0003-osc-ffff880b272cf800: Connection restored to service 
> lustre-OST0003 using nid 10.148.0.106 at o2ib.
> Sep 30 08:40:37 service0 kernel: [343152.842636] Lustre: Skipped 3 
> previous similar messages
> 
> 
> Thanks and Regards
> Ashok
> 
> -- 
> *Ashok Nulguda
> *
> *TATA ELXSI LTD*
> *Mb : +91 9689945767
> *
> *Email :ashokn at tataelxsi.co.in <mailto:tshrikant at tataelxsi.co.in>*
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
______________________________________________________________________
This email may contain privileged or confidential information, which should only be \
used for the purpose for which it was sent by Xyratex. No further rights or licenses \
are granted to use such information. If you are not the intended recipient of this \
message, please notify the sender by return and delete it. You may not use, copy, \
disclose or rely on the information contained in it.  
Internet email is susceptible to data corruption, interception and unauthorised \
amendment for which Xyratex does not accept liability. While we have taken reasonable \
precautions to ensure that this email is free of viruses, Xyratex does not accept \
liability for the presence of any computer viruses in this email, nor for any losses \
caused as a result of viruses.  
Xyratex Technology Limited (03134912), Registered in England & Wales, Registered \
Office, Langstone Road, Havant, Hampshire, PO9 1SA.  
The Xyratex group of companies also includes, Xyratex Ltd, registered in Bermuda, \
Xyratex International Inc, registered in California, Xyratex (Malaysia) Sdn Bhd \
registered in Malaysia, Xyratex Technology (Wuxi) Co Ltd registered in The People's \
Republic of China and Xyratex Japan Limited registered in Japan. \
______________________________________________________________________  


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic