Categories
FreeBSD/Unix

Gmirror device and S.M.A.R.T. Current_Pending_Sector

I woke up this morning on that the smartd was reporting an error: “smartd[1209]: Device: /dev/ad3, 1 Currently unreadable (pending) sectors” from one of the physical devices that was connected to my mirrored device. When I googled I found explanation:

Current count of unstable sectors (waiting for remapping). The raw value of
this attribute indicates the total number of sectors waiting for remapping. 
Later, when some of these sectors are read successfully, the value is 
decreased. If errors still occur when reading some sector, the hard drive will 
try to restore the data, transfer it to the reserved disk area (spare area) and 
mark this sector as remapped. If this attribute value remains at zero, it 
indicates that the quality of the corresponding surface area is low.

smartctl -a /dev/ad3 showed the following:

...
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
...

So what needed to be done was to force the drive to write the area with new data and thus getting the sector either remapped or cleared from the error. As the disk was a mirrored drive and the other drive was working properly (+ I have backups in case of horror) I decided that I can remove the drive from the mirror using gmirror remove. Then I ran a long test with smartctl -t long /dev/ad3 and it reported no errors. After that I inserted the drive back to the mirror and as the drive was resynchronized S.M.A.R.T. Current_Pending_Sector value was decreased:

...
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
...

The drive didn’t increase the Reallocated_Sector_Ct.

——

Update: Although I managed to correct the pending sectors couple of days after the first error occurred the drive started to fail totally. So tweaking and fixing was not very helpful as the errors started to escalate. In conclusion I decided to change the drive.

Smartd status just before I removed the drive:
smartd[1209]: Device: /dev/ad3, 348 Currently unreadable (pending) sectors
smartd[1209]: Device: /dev/ad3, 257 Offline uncorrectable sectors

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
  3 Spin_Up_Time            0x0027   239   238   021    Pre-fail  Always       -       1050
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       54
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       12022
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       52
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       51
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2
194 Temperature_Celsius     0x0022   124   107   000    Old_age   Always       -       23
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   196   196   000    Old_age   Always       -       348
198 Offline_Uncorrectable   0x0030   197   197   000    Old_age   Offline      -       257
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   001   001   000    Old_age   Offline      -       24383