[prev in list] [next in list] [prev in thread] [next in thread] 

List:       smartmontools-support
Subject:    [smartmontools-support] SMART monitoring of OCZ vertex Indilinx
From:       Richard Wall <richard () the-moon ! net>
Date:       2010-09-22 13:33:07
Message-ID: AANLkTimaZgBGMmuird5eW1DEssxU1swBwbSdCjCt9B1v () mail ! gmail ! com
[Download RAW message or body]

Hello,

First of all thanks to all the developers of smartmontools and
apologies for the length of following message. I have noticed two
problems which I have outlined below, followed by various information
and logs from the system.


== Unable to run offline selftests on SATA OCZ vertex SSD ==

I have compiled the latest smartctl from SVN trunk in order to test a
possibly failing OCZ-VERTEX SSD.

I've read lots of previous messages on this mailing list and on the
ocz forum which explain that the meaning of smart attributes on SSDs
is still in flux, but I am having difficulty even running the self
tests.

I tried the following commands to initiate background selftests
 * smartctl -t short /dev/sdb
 * smartctl -t long /dev/sdb

Neither of these work - when I subsequently run...
 * smartctl -l selftest /dev/sdb

...to monitor the progress of the test, it always reports "Aborted by host"

If instead I run in "captive" mode, the tests run in foreground mode
and when they complete I can view the test results using -l selftest
The short test results tell me that errors have been encountered at
various LBA offsets.
The long test results tell me that the no errors were found.

{{{
# smartctl -C -t short /dev/sdb
smartctl 5.40 2010-09-18 r3156 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
captive mode".
Drive command "Execute SMART Short self-test routine immediately in
captive mode" successful.
Testing has begun.
}}}

{{{
# smartctl -l selftest /dev/sdb
smartctl 5.40 2010-09-18 r3156 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       90%        66
      4566016
# 2  Short captive       Completed: read failure       90%        65
      9728640
# 3  Extended captive    Completed without error       00%        65         -
# 4  Short captive       Completed: read failure       90%        65
      22633856
# 5  Short captive       Completed: read failure       90%        64
      4545664
# 6  Short offline       Aborted by host               90%        64         -
# 7  Short captive       Completed: read failure       90%        64
      7128128
}}}

This device does seem to be faulty. It is mounted in Linux with a
Reiser filesystem and used a cache storage partition for a busy Squid
proxy server. It operates in a cluster with two other identically
specified devices - neither of which display any problems.
The visible symptoms are that after a period of time, Squid reports
corruptions in the files it reads from the partition on that SSD,
shortly afterwards, reiser FS reports corruptions and remounts the
partition readonly.

So my questions are:
 * can I trust the self test results from this device?
 * is there any further information I can provide which will help
improve the smartmontools support for this device? eg
{{{
Warning: device does not support Error Logging
Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
No Errors Logged
}}]
 * which SSD devices are best supported by smartmontools?



== Kernel loses contact with first of two SATA SSD devices after
smartctl offline scan ==

 * I also encounter different problems when I attempt a Captive test
of the another SSD installed in this system - the kernel driver
appears to lose contact with the device until the system is rebooted.
Has anyone got any ideas what might be the cause of this?
{{{
# smartctl -C -t short /dev/sdc
smartctl 5.40 2010-09-18 r3156 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
captive mode".
Drive command "Execute SMART Short self-test routine immediately in
captive mode" successful.
Testing has begun.
}}}

No test results available for this device
{{{
# smartctl -l selftest /dev/sdc
smartctl 5.40 2010-09-18 r3156 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Device does not support Self Test logging
}}}

The following errors appear in kernel logs after running the test.
{{{
[  266.103805] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  266.103813] ata3.00: cmd b0/d4:00:81:4f:c2/00:00:00:00:00/00 tag 0
[  266.103815]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[  266.103818] ata3.00: status: { DRDY }
[  266.103824] ata3: hard resetting link
[  276.076664] ata3: softreset failed (device not ready)
[  276.076668] ata3: hard resetting link
[  286.049521] ata3: softreset failed (device not ready)
[  286.049525] ata3: hard resetting link
[  296.580639] ata3: link is slow to respond, please be patient (ready=0)
[  320.976461] ata3: softreset failed (device not ready)
[  320.976465] ata3: limiting SATA link speed to 1.5 Gbps
[  320.976468] ata3: hard resetting link
[  326.156315] ata3: softreset failed (device not ready)
[  326.156318] ata3: reset failed, giving up
[  326.156321] ata3.00: disabled
[  326.156332] ata3: EH complete
[  337.449622] program smartctl is using a deprecated SCSI ioctl,
please convert it to SG_IO
[  337.449634] program smartctl is using a deprecated SCSI ioctl,
please convert it to SG_IO
[  337.449643] program smartctl is using a deprecated SCSI ioctl,
please convert it to SG_IO
[  337.449649] program smartctl is using a deprecated SCSI ioctl,
please convert it to SG_IO

}}}

-RichardW.


== System Information ==
{{{
# uname -r
2.6.31.13
}}}

{{{
# sfdisk -g
/dev/sda: 127 cylinders, 255 heads, 63 sectors/track
/dev/sdb: 127 cylinders, 255 heads, 63 sectors/track
/dev/sdc: 3892 cylinders, 255 heads, 63 sectors/track
/dev/sdd: 3892 cylinders, 255 heads, 63 sectors/track
/dev/sde: 121601 cylinders, 255 heads, 63 sectors/track
}}}

SSD device settings.
{{{
# hdparm -V
hdparm v9.15

# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
	Model Number:       OCZ-VERTEX
	Serial Number:      27125E9K32B4884Z8804
	Firmware Revision:  1.6
Standards:
	Supported: 8 7 6 5
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:   62533296
	LBA48  user addressable sectors:   62533296
	Logical  Sector size:                   512 bytes
	Physical Sector size:                   512 bytes
	device size with M = 1024*1024:       30533 MBytes
	device size with M = 1000*1000:       32017 MBytes (32 GB)
	cache/buffer size  = unknown
	Nominal Media Rotation Rate: Solid State Device
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 1	Current = 1
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	DOWNLOAD_MICROCODE
	    	SET_MAX security extension
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Gen2 signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Phy event counters
	    	DMA Setup Auto-Activate optimization
	   *	Software settings preservation
	   *	Data Set Management indeterminate TRIM supported
Security:
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
	not	supported: enhanced erase
Checksum: correct

}}}


SSD was not mounted during the tests
{{{
# mount | grep /dev/sdb
}}}


Smartmontools information
{{{
# smartctl -V
smartctl 5.40 2010-09-18 r3156 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

smartctl comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
the terms of the GNU General Public License Version 2.
See http://www.gnu.org for further details.

smartmontools release 5.40 dated 2009-12-09 at 21:00:32 UTC
smartmontools SVN rev 3156 dated 2010-09-18 at 19:30:39
smartmontools build host: i686-pc-linux-gnu
smartmontools build configured: 2010-09-21 16:46:55 UTC
smartctl compile dated Sep 21 2010 at 17:48:17
smartmontools configure arguments:  '--prefix=/usr'


# smartctl --all /dev/sdb
smartctl 5.40 2010-09-18 r3156 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Indilinx Barefoot based SSDs
Device Model:     OCZ-VERTEX
Serial Number:    27125E9K32B4884Z8804
Firmware Version: 1.6
User Capacity:    32,017,047,552 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Sep 22 07:22:46 2010 BOT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline
data collection: 		 (   0) seconds.
Offline data collection
capabilities: 			 (0x1d) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Abort Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x00)	Error logging NOT supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   0) minutes.
Extended self-test routine
recommended polling time: 	 (   0) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0000   ---   ---   ---    Old_age
Offline      -       7
  9 Power_On_Hours          0x0000   ---   ---   ---    Old_age
Offline      -       65
 12 Power_Cycle_Count       0x0000   ---   ---   ---    Old_age
Offline      -       8
184 Initial_Bad_Block_Count 0x0000   ---   ---   ---    Old_age
Offline      -       220
195 Program_Failure_Blk_Ct  0x0000   ---   ---   ---    Old_age
Offline      -       0
196 Erase_Failure_Blk_Ct    0x0000   ---   ---   ---    Old_age
Offline      -       0
197 Read_Failure_Blk_Ct     0x0000   ---   ---   ---    Old_age
Offline      -       0
198 Read_Sectors_Tot_Ct     0x0000   ---   ---   ---    Old_age
Offline      -       305154513
199 Write_Sectors_Tot_Ct    0x0000   ---   ---   ---    Old_age
Offline      -       90716535
200 Read_Commands_Tot_Ct    0x0000   ---   ---   ---    Old_age
Offline      -       1033701
201 Write_Commands_Tot_Ct   0x0000   ---   ---   ---    Old_age
Offline      -       349229
202 Error_Bits_Flash_Tot_Ct 0x0000   ---   ---   ---    Old_age
Offline      -       72821
203 Corr_Read_Errors_Tot_Ct 0x0000   ---   ---   ---    Old_age
Offline      -       71091
204 Bad_Block_Full_Flag     0x0000   ---   ---   ---    Old_age
Offline      -       0
205 Max_PE_Count_Spec       0x0000   ---   ---   ---    Old_age
Offline      -       10000
206 Min_Erase_Count         0x0000   ---   ---   ---    Old_age
Offline      -       1
207 Max_Erase_Count         0x0000   ---   ---   ---    Old_age
Offline      -       106
208 Average_Erase_Count     0x0000   ---   ---   ---    Old_age
Offline      -       7
209 Remaining_Lifetime_Perc 0x0000   ---   ---   ---    Old_age
Offline      -       100
211 SATA_Error_Ct_CRC       0x0000   ---   ---   ---    Old_age
Offline      -       0
212 SATA_Error_Ct_Handshake 0x0000   ---   ---   ---    Old_age
Offline      -       0
213 Indilinx_Internal       0x0000   ---   ---   ---    Old_age
Offline      -       0

Warning: device does not support Error Logging
Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Completed: read failure       90%        65
      9728640
# 2  Extended captive    Completed without error       00%        65         -
# 3  Short captive       Completed: read failure       90%        65
      22633856
# 4  Short captive       Completed: read failure       90%        64
      4545664
# 5  Short offline       Aborted by host               90%        64         -
# 6  Short captive       Completed: read failure       90%        64
      7128128
# 7  Extended offline    Aborted by host               90%        64         -
# 8  Short captive       Completed without error       00%        48         -
# 9  Short captive       Completed without error       00%        47         -
#10  Short offline       Aborted by host               90%        47         -
#11  Short offline       Aborted by host               90%        47         -
#12  Short captive       Completed: read failure       90%        47
      7077888
#13  Short offline       Aborted by host               90%        47         -
#14  Extended offline    Aborted by host               90%        46         -
#15  Short offline       Aborted by host               90%        46         -

Device does not support Selective Self Tests/Logging

}}}

Motherboard details
{{{
# cat /sys/class/dmi/id/board_name
P5BV-M
# cat /sys/class/dmi/id/board_vendor
ASUSTeK Computer INC.
}}}

Memory
{{{
# cat /proc/meminfo
MemTotal:        3634680 kB
MemFree:          105752 kB
Buffers:          123848 kB
Cached:          3012868 kB
SwapCached:            0 kB
}}}

CPU
{{{
# cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz
stepping	: 10
cpu MHz		: 3165.713
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx
est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi
flexpriority
bogomips	: 6331.42
clflush size	: 64
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz
stepping	: 10
cpu MHz		: 3165.713
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm
constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx
est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi
flexpriority
bogomips	: 6332.51
clflush size	: 64
power management:
}}}



Squid error log
{{{
2010/09/20 11:59:46| storeSwapMetaUnpack: bad type (0)!
2010/09/20 11:59:46| assertion failed: store_client.c:412: "t->length
== SQUID_MD5_DIGEST_LENGTH"
2010/09/20 11:59:51| Starting Squid Cache version 2.7.STABLE9 for
i686-pc-linux-gnu...
}}}

Kernel ReiserFS errors
{{{
Sep 20 21:12:59 SVG2 kernel: [21051.912099] REISERFS warning:
reiserfs-5082 is_leaf: free space seems wrong: level=1, nr_items=50,
free_space=384 rdkey
Sep 20 21:12:59 SVG2 kernel: [21051.912102] REISERFS error (device
sdb1): vs-5150 search_by_key: invalid format found in block 6782980.
Fsck?
Sep 20 21:12:59 SVG2 kernel: [21051.912104] REISERFS (device sdb1):
Remounting filesystem read-only
Sep 20 21:12:59 SVG2 kernel: [21051.912107] REISERFS error (device
sdb1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred
trying to find stat data of [20 6702 0x0 SD]
Sep 20 21:12:59 SVG2 kernel: [21051.912114] REISERFS warning:
reiserfs-5082 is_leaf: free space seems wrong: level=1, nr_items=50,
free_space=384 rdkey
Sep 20 21:12:59 SVG2 kernel: [21051.912116] REISERFS error (device
sdb1): vs-5150 search_by_key: invalid format found in block 6782980.
Fsck?
Sep 20 21:12:59 SVG2 kernel: [21051.912119] REISERFS error (device
sdb1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred
trying to find stat data of [20 6704 0x0 SD]
Sep 20 21:12:59 SVG2 squid[3144]: Write failure -- check your disk
space and cache.log
Sep 20 21:13:01 SVG2 squid[3142]: Squid Parent: child process 3144
exited due to signal 6
Sep 20 21:13:04 SVG2 squid[3142]: Squid Parent: child process 23110 started
Sep 20 21:13:05 SVG2 squid[23110]: storeCossDirOpenSwapLog: Failed to
open swap log.
}}}

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Smartmontools-support mailing list
Smartmontools-support@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/smartmontools-support
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic