[prev in list] [next in list] [prev in thread] [next in thread]
List: lustre-discuss
Subject: [lustre-discuss] lustre-discuss Digest, Vol 112, Issue 30
From: strosahl () jlab ! org (Kurt Strosahl)
Date: 2015-07-29 16:03:05
Message-ID: 1151653879.3107832.1438185785151.JavaMail.zimbra () jlab ! org
[Download RAW message or body]
Hi Massimo,
This sounds exactly like the issue I encountered over a month ago with my lustre \
2.5.3 system. The quick solution I found was to set the qos_threshold_rr to 100% (so \
flat round robin, not weighted). However that causes a problem where osts would go \
over 90% while others were still under 50%. I was able to come up with a hack... I \
created a pool that included all the osts except the ones that were not usable, and \
then put every directory in that pool (called production). Once that was done I was \
able to turn back on the qos round robin.
A problem with this is that, in 2.5.3, pools are not properly inherited \
(https://jira.hpdd.intel.com/browse/LU-5916). That means that new directories \
wouldn't get the pool information, and would thus only land on osts above the bad \
ones. This was solved using the changelog, which shows when directories are created. \
We were then able to write some code that assigned every new directory to the \
production pool. So far it seems to be working.
Another issue I've since discovered is that since files created before the production \
pool was created don't have a pool then using lfs_migrate (which uses the files \
striping, not the directory striping) caused files to be written to the osts above \
the bad osts.
w/r,
Kurt
Message: 3
Date: Wed, 29 Jul 2015 17:31:25 +0200
From: Massimo Sgaravatto <massimo.sgaravatto at pd.infn.it>
To: lustre-discuss at lists.lustre.org
Subject: [lustre-discuss] Lustre doesn't use new OST
Message-ID: <55B8F1CD.5080509 at pd.infn.it>
Content-Type: text/plain; charset="utf-8"; Format="flowed"
Hi
We had a Lustre filesystem composed of 5 OSTs.
Because of a problem with 3 OSTs (the problem is described in the thread
"Problems moving an OSS from an old Lustre installation to a new one"),
we disabled them.
Now we want to reformat (mkfs.lustre --reformat ...) these 3 OSS and
make them on-line.
For the time being we performed this operation just for one OSS (using a
new index number).
The current scenario is the following (OST0005 is the reformatted OST):
lfs df -h /lustre/cmswork/
UUID bytes Used Available Use% Mounted on
cmswork-MDT0000_UUID 374.9G 3.5G 346.4G 1%
/lustre/cmswork[MDT:0]
cmswork-OST0000_UUID 18.1T 14.5T 2.7T 84%
/lustre/cmswork[OST:0]
cmswork-OST0001_UUID 18.1T 14.2T 3.0T 83%
/lustre/cmswork[OST:1]
OST0002 : inactive device
OST0003 : inactive device
OST0004 : inactive device
cmswork-OST0005_UUID 13.6T 415.1M 12.9T 0%
/lustre/cmswork[OST:5]
filesystem summary: 49.7T 28.7T 18.5T 61%
/lustre/cmswork
The problem is that the "Lustre scheduler" is not selecting OST0005 at
all for new files.
Only if I use "lfs setstripe --index 5 " I see that the relevant files
are written to this OST. Otherwise only OST0000 and OST0001 are used
We didn't change the values for qos_threshold_rr and qos_prio_free,
which are therefore using the default values (17%, 91 %).
I can't find anything useful in the log files.
Any idea ?
Thanks, Massimo
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic