[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    Re: [lustre-discuss] Unable to mount client with 56 MDSes and beyond
From:       Andreas Dilger <adilger () whamcloud ! com>
Date:       2019-05-22 8:02:59
Message-ID: DB52DDCD-7FD8-459E-A1CF-612EEA573F75 () whamcloud ! com
[Download RAW message or body]

Scott, if you haven't already done so, it is probably best to file a ticket in Jira \
with the details.  Please include the client syslog/dmesg as well as a Lustre debug \
log ("lctl dk /tmp/debug") so that the problem can be isolated.

During DNE development we tested with up to 128 MDTs in AWS, but haven't tested that \
many MDTs in some time.

Cheers, Andreas

On May 8, 2019, at 12:28, White, Scott F <sfpwhite@lanl.gov> wrote:
> 
> We've been testing DNE Phase II and tried scaling the number of MDSes(one MDT each \
> for all of our tests) very high, but when we did that, we couldn't mount the \
> filesystem on a client.  After trial and error, we discovered that we were unable \
> to mount the filesystem when there were 56 MDSes. 55 MDSes mounted without issue, \
> and it appears any number below that will mount. This failure at 56 MDSes was \
> replicable across different nodes being used for the MDSes, all of which were \
> tested with working configurations, so it doesn't seem to be a bad server. 
> Here's the error info we saw in dmesg on the client:
> 
> LustreError: 28880:0:(obd_config.c:559:class_setup()) setup \
>                 lustre-MDT0037-mdc-ffff95923d31b000 failed (-16)
> LustreError: 28880:0:(obd_config.c:1836:class_config_llog_handler()) \
>                 MGCx.x.x.x@o2ib: cfg command failed: rc = -16
> Lustre:    cmd=cf003 0:lustre-MDT0037-mdc  1:lustre-MDT0037_UUID  2:x.x.x.x@o2ib
> LustreError: 15c-8: MGCx.x.x.x@o2ib: The configuration from log 'lustre-client' \
> failed (-16). This may be the result of communication errors between this node and \
>                 the MGS, a bad configuration, or other errors. See the syslog for \
>                 more information.
> LustreError: 28858:0:(obd_config.c:610:class_cleanup()) Device 58 not setup
> Lustre: Unmounted lustre-client
> LustreError: 28858:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-16)
> 
> OS: CentOS 7.6.1810 
> Kernel: 3.10.0-957.5.1.el7.x86_64
> Lustre: 2.12.1
> Network card: Qlogic InfiniPath_QLE7340
> 
> Other things to note for completeness' sake: this happened with both ldiskfs and \
> zfs backfstypes, and these tests were using files in memory as the backing devices. \
>  Is there something I'm missing as to why more than 56 MDSes won't mount?
> 
> Thanks,
> Scott White
> Scientist, HPC 
> Los Alamos National Laboratory
> 
> _______________________________________________
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic