[prev in list] [next in list] [prev in thread] [next in thread]
List: ocfs2-devel
Subject: Re: [Ocfs2-devel] [RFC] Online File(system) check
From: Joseph Qi <joseph.qi () huawei ! com>
Date: 2015-04-30 2:29:10
Message-ID: 55419376.70606 () huawei ! com
[Download RAW message or body]
On 2015/4/29 10:37, Gang He wrote:
> Hi Joseph,
>
> Thanks for your detailed description.
> See my question inline.
>
>
> > > >
> > Hi Goldwyn,
> >
> > Thanks for the good proposal.
> >
> > On 2015/4/28 20:21, Goldwyn Rodrigues wrote:
> > > Hi Gang,
> > >
> > > On 04/27/2015 10:00 PM, Gang He wrote:
> > > > Hi Glodwyn,
> > > >
> > > > Very nice proposal.
> > > > So far, there are some comments from me.
> > > > 1) which task will we do in check/fix a file, we need to define the detailed
> > requirements further, since we just do a light-level file check/fix according
> > to inode number, we need to know which items can be done by online check,
> > which items can be done by offline fsck.
> > >
> > > For the first phase (regular files), these are all the reasons the disk
> > validate function would fail. Some examples are ocfs2_validate_inode_block,
> > ocfs2_validate_extent_block etc.
> > > As we take up system inodes (phase 2), we will add more functionality.
> > >
> > Can we classify all corrupted cases and their corresponding fix ways? Maybe
> > we can get some hints from fsck.
> > And I don't think errors=continue can fit for all cases.
> > For some cases we shouldn't let it continue with errors to prevent more
> > damages.
> >
> > > > 2) can we keep check and fix two option, check option is to check if a file
> > is good or bad, but not modify anything, fix option is to check and fix a
> > file if the file is corrupted.
> > >
> > > Yes, there are two options, CHECKS only checks wheras FIX fixes the errors.
> > As a precautionary measure, a CHECK command should be provided before a FIX
> > is issued. IOW, a file should be checked for errors before actually fixing
> > it.
> > >
> > A convenient way to know which to be checked should also be taken into
> > consideration.
> >
> > > > 3) when users execute the command "echo CHECK <inode> >
> > /sys/fs/ocfs2/filecheck" to check a file, how to give the feedback
> > information besides printing the messages to syslog?
> > >
> > > The output should be when you cat /sys/fs/ocfs2/filecheck. It would provide
> > the results of the last (N) files checked. I don't want to flood the kernel
> > log with this. Thanks for bringing this up, I will put it on the doc.
> > Something like:
> > >
> > > Inode Status Description
> > > 1234 ERROR Metadata incorrect
> > > 2352 FIXED Valid flag not set
> > > 9382 CHECKING -
> > > 8926 GOOD -
> > > 7230 CANT-FIX Please execute fsck.ocfs2 after taking filesystem offline.
> > >
> > > So, for the current scenario, only 1234 can be fixed. An echo should err
> > with EINVAL if any other inode number is provided with FIX.
> > >
> > >
> > > > 4) we should support a list to accept the "check/fix" requests from
> > user-space and queue them, then handle them one by one, right? what is the
> > behavior for the request user which execute "echo check ..." from the user
> > space? the user post a request to the kernel space, then the command will end
> > or wait for the file check end?
> > > >
> > >
> > > I would not suggest that, atleast for now. This is to improve availability.
> > However, if the filesystem is very bad, we should suggest an offline check.
> > However, the user can provide multiple CHECK requests.
> My question is, if users can execute "echo check > .." to check/fix files \
> simultaneously? since users can trigger this command from different terminates.
I think we have to restrict it. Since offline fsck is also not supposed
to allow such a case.
If we have to, maybe user dlm can take care of this.
> Second, users send a command to kernel space, the kernel space have to cache these \
> commands in a list/array, since kernel can not finish a check request immediately, \
> otherwise, how does the kernel accept a new request during the kernel are handing \
> the current request.
I think the operations should be done one by one.
IMO, kernel finds the corruption and reports to user space.
In user space we maintain a corruptions list.
Then user check/fix one by one.
>
> Thanks
> Gang
>
> > >
>
>
> .
>
_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic