[prev in list] [next in list] [prev in thread] [next in thread] 

List:       smartmontools-support
Subject:    Re: [smartmontools-support]Re: concern about two disk drives
From:       Bruce Allen <ballen () gravity ! phys ! uwm ! edu>
Date:       2004-05-14 17:46:09
Message-ID: Pine.GSO.4.21.0405141158280.23865-100000 () dirac ! phys ! uwm ! edu
[Download RAW message or body]

José,

PLEASE continue to copy the list.  Note that if you searched the archives,
you'd find answers to your questions there.

> But the servers are online always since the same days....

Were the disks new when they were put into the servers?

> But as I understood, the disk relocates the bad-sector automaticly, I
> don't need to do the steps in BadBlockHowTo?

The disk will reallocate the bad sector automatically, IF IT CAN READ THE
BAD SECTOR.  If it can't read the bad sector, it will go onto the 'Pending
Sector List' and not reallocate automatically.

> Ok, but if you have raid 1, the how does the operating system knows that
> even after the error the disk is okay as long as it writes again the damaged
> block?

It depends upon the RAID implementation and controller.

> Or does it know and does it automaticly?

Again, it depends upon the RAID implementation and controller.  (For
example, I am trying to learn what happens with a 3ware raid controller if
it finds an unreadable sector on a mirrored disk.  What it *should* do is
immediately WRITE that sector with identical data taken from another disk.  
But I don't know if the hardware is 'smart' enough to do this.)

> When you have a very large file, and a block is corrupted, it would take you
> to write again the block and all the data should be okay, I guess...
> But does the operating system block only the damaged sector right?

The OS typically will be reading into blocks of some size.  The call that
fails will be the block that contains the damaged sector.

> I'm getting a bit confused as I have awaken to a problem that is very
> complex, much more than anticipated.

If you do some Google searching of newsgroups, and search the
smartmontools-support mail archives, you can learn more.

Cheers,
	Bruce



> ----- Original Message ----- 
> From: "Bruce Allen" <ballen@gravity.phys.uwm.edu>
> To: "José Alexandre Antunes Faria" <jose.faria@vianw.pt>
> Cc: <smartmontools-support-request@lists.sourceforge.net>
> Sent: Friday, May 14, 2004 2:55 PM
> Subject: Re: [smartmontools-support]Re: concern about two disk drives
> 
> 
> > > But so the values for 1, 7 and 195 are normal?
> >
> > Yes.  The normalized values are well above the failure thresholds.
> >
> > > But in the one I had errors, the filesystem was damaged beyond repair
> > > is that a consequence of the problem or was it caused by something
> > > else?
> >
> > It was probably caused when the disk couldn't read certain sectors. This
> > is (unfortunately) a common occurence on modern disks.  Often the disk can
> > (after many retries) read the sector, but sometimes it can't.  At this
> > point, unless you use RAID or backups, data is lost. See
> > http://smartmontools.sourceforge.net/BadBlockHowTo.txt for additional
> > info.
> >
> > > (I have read errors.) Do those problems that you think are normal
> > > promote data corruption?
> >
> > They can lead to data loss (more than data corruption).
> >
> > > The differente values of the power on hours what do they mean?
> >
> > The disks have been in use for different numbers of hours.
> >
> > > (data corruption) But is it possible that when the disk has problems
> > > he gives corrupted data, when he should feed a read error?
> >
> > This is of course possible.  But the more common scenario is that when the
> > disk has a read error at a particular sector, the associated file or
> > directory becomes unreadable.
> >
> > Cheers,
> > Bruce
> >
> > > ----- Original Message ----- 
> > > From: "Bruce Allen" <ballen@gravity.phys.uwm.edu>
> > > To: "José Alexandre Antunes Faria" <jose.faria@vianw.pt>
> > > Cc: "Smartmontools Mailing List"
> > > <smartmontools-support@lists.sourceforge.net>
> > > Sent: Friday, May 14, 2004 12:50 PM
> > > Subject: [smartmontools-support]Re: concern about two disk drives
> > >
> > >
> > > > Hi José,
> > > >
> > > > > I'm sorry to reply only now, but I had a very busy today (out of the
> > > > > office), but here is the output...
> > > >
> > > > Please continue to copy the support list -- others may find this
> useful.
> > > >
> > > > Looking at the smartctl -a output of the two drives, I don't see any
> signs
> > > > of trouble on either one. I do note that drive 3KA1RESZ has a number
> of
> > > > entries in its SMART error log.  This is because the drive had 16
> > > > unreadable sectors, and sucessfully reallocated them.  This can be
> seen
> > > > from Attributes 5, 197 and 198. This is not a sign of trouble -- it is
> > > > normal.  Modern hard drives frequently have sectors that are bad, or
> that
> > > > go bad, and need to be reallocated.  Your best protection for this is
> to
> > > > run frequent long self-test read scans to be sure that there are no
> > > > unreadable sectors on ths disk.
> > > >
> > > > If the disk continues to find and reallocate larger and larger numbers
> of
> > > > unreadable sectors, ie hundreds or thousands, and other identical
> drives
> > > > do not, then you might consider replacing it.
> > > >
> > > > I do note that the drives had maximum temperatures of 50 and 53
> Celsius --
> > > > it would be wise to keep them from ever getting to such temperatures
> if
> > > > possible.
> > > >
> > > > > PS: About data corruption, what do you recomend to avoid reading
> > > corrupted
> > > > > data?
> > > >
> > > > I'm not an expert on this subject: my only specialized computer
> expertise
> > > > is in the SMART functionality of disk drives. I didn't respond to your
> > > > earlier email because I think that there really is no 'magic
> solution'.
> > > > You can take steps to avoid data corruption but there are number of
> things
> > > > that make it almost unavoidable, if you have enough data and
> read/write
> > > > the data enough enough.  Data corruption can be caused by cosmic rays
> > > > interacting with electronics in an undetectable way, for example.
> > > >
> > > > Another example is TCP data transmission (supposedly reliable) using
> > > > commodity ethernet and switching hardware.  Studies have shown about
> one
> > > > undetected single-bit error per 4 TB of transmitted data.  This is
> simply
> > > > because the TCP checksums have a finite length and don't always detect
> > > > errors.
> > > >
> > > > In the physics project that I work on, where we have several hundred
> > > > Terabytes of data, we *try* to protect ourselves by keeping a couple
> of
> > > > different md5 checksums of the data. One set of checksums is kept
> > > > internally in the data itself, and another set of checksums is kept
> > > > externally in separate files.  This seems to work well, but again, is
> no
> > > > guarantee.
> > > >
> > > > Cheers,
> > > > Bruce
> > > >
> > > >
> > > > > ----- Original Message ----- 
> > > > > From: "Bruce Allen" <ballen@gravity.phys.uwm.edu>
> > > > > To: "José Alexandre Antunes Faria" <jose.faria@vianw.pt>
> > > > > Cc: "Smartmontools Mailing List"
> > > > > <smartmontools-support@lists.sourceforge.net>
> > > > > Sent: Thursday, May 13, 2004 8:11 AM
> > > > > Subject: concern about two disk drives
> > > > >
> > > > >
> > > > > > Hi José,
> > > > > >
> > > > > > The correct mailing list to use is smartmontools-support.  The
> drives
> > > are
> > > > > > already in the database.
> > > > > >
> > > > > > > As you don't help out either...You can at least see how fun it
> is
> > > being
> > > > > > > joked around in seagate technical support.
> > > > > > >
> > > > > > > I'm not sure of what to do... As I have no time to buy more
> drives
> > > just
> > > > > to
> > > > > > > test. I'm not sure of what to do, I'm leaning forward to replace
> > > them
> > > > > with
> > > > > > > maxtors, as WD don't fit there.
> > > > > >
> > > > > > Could you please send the output of smartctl -a for the two
> drives, as
> > > > > > separate .txt attachments to your email?  The output you sent in
> your
> > > > > > latest email is too badly messed up from line and mailer wrapping
> to
> > > be
> > > > > > easily readable.
> > > > > >
> > > > > > Cheers,
> > > > > > Bruce
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > This SF.Net email is sponsored by: SourceForge.net Broadband
> > > > Sign-up now for SourceForge Broadband and get the fastest
> > > > 6.0/768 connection for only $19.95/mo for the first 3 months!
> > > > http://ads.osdn.com/?ad_id%62&alloc_ida84&opÌk
> > > > _______________________________________________
> > > > Smartmontools-support mailing list
> > > > Smartmontools-support@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/smartmontools-support
> > > >
> > >
> > >
> >
> >
> 
> 



-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id%62&alloc_ida84&opÌk
_______________________________________________
Smartmontools-support mailing list
Smartmontools-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/smartmontools-support

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic