'Re: [Ocfs2-devel] [RFC] Online File(system) check'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ocfs2-devel
Subject:    Re: [Ocfs2-devel] [RFC] Online File(system) check
From:       Joseph Qi <joseph.qi () huawei ! com>
Date:       2015-04-30 2:29:10
Message-ID: 55419376.70606 () huawei ! com
[Download RAW message or body]

On 2015/4/29 10:37, Gang He wrote:
> Hi Joseph,
> 
> Thanks for your detailed description.
> See my question inline.
> 
> 
> > > > 
> > Hi Goldwyn,
> > 
> > Thanks for the good proposal.
> > 
> > On 2015/4/28 20:21, Goldwyn Rodrigues wrote:
> > > Hi Gang,
> > > 
> > > On 04/27/2015 10:00 PM, Gang He wrote:
> > > > Hi Glodwyn,
> > > > 
> > > > Very nice proposal.
> > > > So far, there are some comments from me.
> > > > 1) which task will we do in check/fix a file, we need to define the detailed 
> > requirements further, since we just do a light-level file check/fix according 
> > to inode number, we need to know which items can be done by online check, 
> > which items can be done by offline fsck.
> > > 
> > > For the first phase (regular files), these are all the reasons the disk 
> > validate function would fail. Some examples are ocfs2_validate_inode_block, 
> > ocfs2_validate_extent_block etc.
> > > As we take up system inodes (phase 2), we will add more functionality.
> > > 
> > Can we classify all corrupted cases and their corresponding fix ways? Maybe 
> > we can get some hints from fsck.
> > And I don't think errors=continue can fit for all cases.
> > For some cases we shouldn't let it continue with errors to prevent more 
> > damages.
> > 
> > > > 2) can we keep check and fix two option, check option is to check if a file 
> > is good or bad, but not modify anything, fix option is to check and fix a 
> > file if the file is corrupted.
> > > 
> > > Yes, there are two options, CHECKS only checks wheras FIX fixes the errors. 
> > As a precautionary measure, a CHECK command should be provided before a FIX 
> > is issued. IOW, a file should be checked for errors before actually fixing 
> > it.
> > > 
> > A convenient way to know which to be checked should also be taken into 
> > consideration.
> > 
> > > > 3) when users execute the command "echo CHECK <inode> > 
> > /sys/fs/ocfs2/filecheck" to check a file, how to give the feedback 
> > information besides printing the messages to syslog?
> > > 
> > > The output should be when you cat /sys/fs/ocfs2/filecheck. It would provide 
> > the results of the last (N) files checked. I don't want to flood the kernel 
> > log with this. Thanks for bringing this up, I will put it on the doc. 
> > Something like:
> > > 
> > > Inode Status Description
> > > 1234   ERROR Metadata incorrect
> > > 2352   FIXED Valid flag not set
> > > 9382   CHECKING -
> > > 8926   GOOD -
> > > 7230   CANT-FIX Please execute fsck.ocfs2 after taking filesystem offline.
> > > 
> > > So, for the current scenario, only 1234 can be fixed. An echo should err 
> > with EINVAL if any other inode number is provided with FIX.
> > > 
> > > 
> > > > 4) we should support a list to accept the "check/fix" requests from 
> > user-space and queue them, then handle them one by one, right? what is the 
> > behavior for the request user which execute "echo check ..." from the user 
> > space? the user post a request to the kernel space, then the command will end 
> > or wait for the file check end?
> > > > 
> > > 
> > > I would not suggest that, atleast for now. This is to improve availability. 
> > However, if the filesystem is very bad, we should suggest an offline check. 
> > However, the user can provide multiple CHECK requests.
> My question is, if users can execute "echo check > .." to check/fix files \
> simultaneously? since users can trigger this command from different terminates.
I think we have to restrict it. Since offline fsck is also not supposed
to allow such a case.
If we have to, maybe user dlm can take care of this.

> Second, users send a command to kernel space, the kernel space have to cache these \
> commands in a list/array, since kernel can not finish a check request immediately, \
> otherwise, how does the kernel accept a new request during the kernel are handing \
> the current request.  
I think the operations should be done one by one.
IMO, kernel finds the corruption and reports to user space.
In user space we maintain a corruptions list.
Then user check/fix one by one.

> 
> Thanks
> Gang
> 
> > > 
> 
> 
> .
> 

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

[prev in list] [next in list] [prev in thread] [next in thread]