[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-aacraid-devel
Subject:    File corruption in 2.2.20
From:       ard () waikato ! ac ! nz
Date:       2001-11-12 18:08:00
[Download RAW message or body]

I have a wee program that creates large files of sequentially increasing
long words.  I use it to thrash large filesystems.  With the error
checking removed, its guts are:

 #define BUFFSIZE 1024*1024/sizeof(long)
 long buff[BUFFSIZE];
 for (j = 0; j < megs; j++) {
    for (i = 0; i < BUFFSIZE; i++) buff[i]=(j*BUFFSIZE) + i;
    write (fileno(stdout), buff, BUFFSIZE * sizeof(long));
  }

I run that inside a script that backgrounds three of those creating
1900MB files on an 8G filesystem, and another 30 creating 1900MB files on
a 68G filesystem:

  for f in `seq 3`; do 
    nice bigfile 1900 > /home/THRASH$f 2>/dev/null & 
  done
  for f in `seq -f %02g 30`; do 
    nice bigfile 1900 > /usr/local/THRASH$f 2>/dev/null & 
  done
  wait

This process takes about an hour, on two identically-configured 3/Di
2500s.

All 30 files in /usr/local are fine, but I'm left with corrupt files in
/home.  They are all 1,992,294,400 bytes long, but 'cmp' and 'sum' say
they're different.

In all six cases, the section of corruption begins on an odd 512-byte
boundary (i.e. not a 1024-byte boundary), and lasts for 7 * 512 bytes.
The files are increasing longwords and it's easy to see that the bad
section of each file is also increasing longwords, starting from an odd
512-byte boundary earlier in the file (or in another THRASH file).  The
bad sections start anywhere between 200MB and 1600MB from the start, and
the "copied in" sections have come from roughly two-thirds of bad
section's offset.

The corrupted files are on the / partition.  All 30 files in /usr/local
are fine.  When I create only three THRASH files in parallel (without the
30 in /usr/local), all three are fine.

I'm running the tests again for a better sample, and will try 2.4.14
tomorrow.  However I'm forced by Real to use 2.2 in production.

In the meantime, if you have any suggestions, please let me know.
Hopefully this is a FAQ and you'll tell me to RTFM better.  I'll even
thank you for that :-)

Thanks in advance.


----------------------------------------------------------------------
ext2 on 2.2.20 with linux-2.2.19-aacraid-20010720.patch, Slackware 8.0.

Controller Information
----------------------
         Remote Computer: .
             Device Name: AFA0
         Controller Type: PERC 3/Di
             Access Mode: READ-WRITE
Controller Serial Number: Last Six Digits = 9438D2
         Number of Buses: 2
         Devices per Bus: 15
          Controller CPU: i960 R series
    Controller CPU Speed: 100 Mhz
       Controller Memory: 126 Mbytes
           Battery State: Ok

Component Revisions
-------------------
                CLI: 3.0-0 (Build #4880)
                API: 3.0-0 (Build #4880)
    Miniport Driver: 2.1-5 (Build #3911)
Controller Software: 2.5-0 (Build #2991)
    Controller BIOS: 2.5-0 (Build #2991)
Controller Firmware: (Build #2991)

Executing: container list
Num          Total  Oth Chunk          Scsi   Partition
Label Type   Size   Ctr Size   Usage   B:ID:L Offset:Size
----- ------ ------ --- ------ ------- ------ -------------
 0    Mirror 16.9GB            Open    0:00:0 64.0KB:16.9GB 
 /dev/sda             system           1:03:0 64.0KB:16.9GB 

 1    Mirror 68.3GB            Open    1:04:0 64.0KB:68.3GB 
 /dev/sdb             data             0:01:0 64.0KB:68.3GB 

        
-- 
_________________________________________________________________________
Andrew Donkin                  Waikato University, Hamilton,  New Zealand

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic