[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ocfs2-users
Subject:    Re: [Ocfs2-users] OCFS2 KVM Crashes Yet Again !
From:       "Gang He" <ghe () suse ! com>
Date:       2017-09-29 6:46:26
Message-ID: 59CE5CC2020000F90008FCCD () prv-mh ! provo ! novell ! com
[Download RAW message or body]

Hello netbsd,

Could you conclude to a way to trigger this crash happen in a normal ocfs2 cluster?
e.g. reproduce steps, or a shell script.

Thanks
Gang


> > > 
> Hello,
> 
> Find the full log below:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.ubuntu.com_25625787_&d=Dw \
> IFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqb \
> tYn-0afBpa7A&m=5ZRqjhlhVphYeGDUyONVUUBrtPi8rLz88ZN7_wbNlNQ&s=CGsTC_h47c4MXFb4l_7fmVPQ9Ru96AAupsNcqdb76Lk&e= \
>  
> VM was restarted at 9:27 and no problem since then. We are rsyncing 
> about 2TB data (a lot of small files) between 2 OCFS shares on the same 
> vm:
> 
> 
> /dev/vdc                      4.8T  2.8T  2.1T  58% /mnt/s1
> /dev/vdf                      4.8T  985G  3.9T  21% /mnt/s2
> 
> rsync -av --numeric-ids --delete /mnt/s1/ /mnt/s2/
> 
> 
> On 2017-09-27 10:53, Gang He wrote:
> > Hello netbsd,
> > 
> > The ocfs2 project is still be developed by us (from SUE, Huawei,
> > Oracle and H3C. etc.).
> > If you encountered some problem, please send the mail to ocfs2-devel
> > mail list, we usually watch that mail for ocfs2 kernel related issues.
> > 
> > 
> > 
> > 
> > > > > 
> > > Hello All,
> > > 
> > > I wrote earlier about our OCFS2 crash issue in KVM due to bug in the 
> > > SMP
> > > code.
> > > 
> > > For this we come up with a solution:
> > > 
> > > Instead of using multiple vcpus
> > > <vcpu placement='static'>8</vcpu>
> > > 
> > > using a single one and multiple cores instead:
> > > <topology sockets='8' cores='8' threads='1'/>
> > > 
> > > And applying key tune options to sysctl.conf:
> > > 
> > > vm.min_free_kbytes=131072
> > > vm.zone_reclaim_mode=1
> > > 
> > > Seemed to be helped, the fs did not crash right away when we were
> > > hammering it with apache benchmarks with 10000 requests however last
> > > night I started a large rsync operation from a 5TB OCFS2 FS mounted in
> > > the VM to another OCFS2 mounted in the same VM and ended up with:
> > > 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_gFeGg5&d=DwICAg&c=R \
> > >  
> oP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=QxGl6UoyzTJm_1fAz5ZR9izvWJhWcqbtY
> 
> > > 
> n-0afBpa7A&m=cYprGRHz-oQmhnx4HIke8sTdCG_tf8Jb-rF6sHnYLnk&s=ajWfQIlUZOpElFWxoKcmvTI
> 
> > > k7J3PpuCJITcnXfJQHrc&e=
> > From the kernel crash backtrace, this problem should be that long time
> > to acquiring spin_lock triggers a NMI interruption.
> > Could you give a detailed reproduce steps? since we want to reproduce
> > this issue in local, then try to fix it.
> > 
> > 
> > Thanks
> > Gang
> > 
> > > 
> > > After trying a lot of different kernels starting from the 3.x series,
> > > now we are using 4.13.2 latest kernel with default configuration but
> > > these issues still present. Is this OCFS2 project still being 
> > > developed?
> > > With this crashing and unreliability it cannot be used in production
> > > unless you put in place bunch of safeguards to reset out the whole
> > > virtualmachine when it crashes.
> > > 
> > > Thanks
> > > 
> > > _______________________________________________
> > > Ocfs2-users mailing list
> > > Ocfs2-users@oss.oracle.com 
> > > https://oss.oracle.com/mailman/listinfo/ocfs2-users


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic