Jump to content
  • 0

Health warning acknowledge


Question

Posted

I have a drive in my pool that has had SMART errors in the past.

I can live with this considering the data stored in the pool and the nature of the errors.

But it leads to a big warning on the dashboard with the rather vague text "One or more of your drives is being reported as unhealthy." I can remove it by disabling SMART for the disk in TrueNAS. I would rather be able to somehow acknowledge the errors that have been experienced, so I can get rid of the warning, but be warned again if a new error occurs, or if another drive experiences an error.

The current behaviour risks masking errors that the user should handle immediately.

2 answers to this question

Recommended Posts

  • 0
Posted (edited)
On 12/26/2024 at 9:39 AM, MathiasMM said:

I have a drive in my pool that has had SMART errors in the past.

I can live with this considering the data stored in the pool and the nature of the errors.

But it leads to a big warning on the dashboard with the rather vague text "One or more of your drives is being reported as unhealthy." I can remove it by disabling SMART for the disk in TrueNAS. I would rather be able to somehow acknowledge the errors that have been experienced, so I can get rid of the warning, but be warned again if a new error occurs, or if another drive experiences an error.

The current behaviour risks masking errors that the user should handle immediately.

I totally hear you on this. Alert Fatigue is a real thing AND a risk. If the dashboard is always red, you eventually stop looking at it, which is exactly when a second drive will fail.

That said, please avoid disabling SMART entirely. It’s better to have a "noisy" warning than to be completely blind to a total disk collapse. Here is my take on how to handle this:

  • The risky move is disabling SMART for the disk doesn't just hide the old errors... it stops the system from telling you if the drive starts developing new problems. 

  • The reality is HexOS/TrueNAS reports the drive as "Unhealthy" because the drive's own firmware has tripped a threshold. The OS can't "clear" a hardware flag that the disk itself is reporting.

  • The advisement is instead of disabling the service, check the specific SMART attributes (like Reallocated_Sector_Ct). If that number stays static for a few weeks, you might be okay. But if that number is climbing, the drive is a ticking time bomb regardless of the data importance.... replace.... IT.... ASAP.

I would recommend you run a long SMART Test. If it passes and the error count doesn't increase, you can sometimes manually tune the alert thresholds in the Disk Settings of TrueNAS to silence that specific error, while keeping the monitor active for new ones. Though that goes against every admin bone in my body.

Stay safe with that data my friend.

Edited by TheGlitch

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...