Solaris: SDS: Both Metadevices of a mirror have "State: Needs maintenance"

0

Both Metadevices of a mirror have “State: Needs maintenance”

Whenever you run into a broken SDS Mirror, beware of the following status:

 

 d1: Mirror Submirror 0: d11 State:Needs maintenance Submirror 1: d10 State:Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 12584484 blocks (6.0 GB) d11: Submirror of d1 State: Needs maintenance Invoke: after replacing "Maintenance" components:  metareplace d1 c0t1d0s0<new device> Size: 12584484 blocks (6.0 GB) Stripe 0: Device     Start Block  Dbase        State Reloc Hot Spare c0t1d0s0          0     No     Last Erred Yes d10: Submirror of d1 State: Needs maintenance Invoke:metasync d1 Size: 12584484 blocks (6.0 GB) Stripe 0: Device     Start Block  Dbase        State Reloc Hot Spare c0t0d0s0          0     No           Okay Yes 

 

 

 

Whenever you have the above status, you will have tofix firstthe submirror thatdoesn’tstateLast Erred. The submirror withLast Erredshould beleft always to be fixed at lastto avoid corruption.

 

 

So in the above status you will have to do the following:

 

  1. Check with df -h weather you are bootet from the metadevices or maybe from the physical devices
  2. Check /var/adm/messages and format->analyze of the Last Erred disk
  3. Fix any reported blocks with format once you identify the blocks (if there are too many Check the NOTE below)
  4. Once blocks are fixed, metasync d1
  5. WAIT for SYNC to finish successfully, when that happens you will see that the state changes to OKAY. You should check if more errors on the disk were reported and then decide if it’s safe to execute step 5.
  6. Then executemetareplace –e d1 c0t1d0s0

 

The metasync d1 will not finish successfully if there is a problem with theLast Erreddisk. Checks on the /var/adm/messages should be done and then with format->disk->analyze to see if the issue is only one block. If the issue is a block repair it with format->disk->repair-><BLOCK with issues>. If the block is fixed then proceed with steps 3 to 5 above.

 

NOTE:The other way to solve this issue if reviewing /var/adm/messages and format and if there are a lot of damaged blocks, thenDO NOTexecute the metasync, it will be best to break the mirror forcely (detach the good side of the disk) and then fsck, mount it, disable SDS form that disk, and then boot from that disk. Once the server boots form that disk, remove all SDS devices and recreate again the mirror from scratch. Once the mirror is recreated, replace the faulty disk and then recreate the rest of the submirrors and attach them to the OS.

If you run the metasync and fails, you will not longer have a valid mirror and if there’s no way to fix theLast Erred disk, then a restore of the OS Disks will be needed to solve the issue.

[print_link]

Leave a Reply