How to rescue an A3500 LUN
Well, today we had a striped volume on our A3500 die. This volume, along with another volume makes up an 300gb veritas volume. One of the A3500 disks died, and it happened to be in one of these LUNs. When recovering, rm6 did not come up, so all I had was commandline. Goddie!
First healthcheck:
# /etc/raid/bin/healthck -a
Health Check Summary Information
a3500_upper: LUN - Hot Spare In Use at Drive [4,0]
a3500_lower: Dead LUN at Drive [5,11]
As you see, we have a dead LUN, and a hot spare. My worry is the dead LUN.
Now to find my LUNs
# /etc/raid/bin/raidutil -c c13t4d0 -i
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
LUN 5 RAID 0 138771 MB
Vendor ID Symbios
ProductID StorEdgeA3500FCd
Product Revision 0301
Boot Level 03.01.04.00
Boot Level Date 04/05/01
Firmware Level 03.01.04.75
Firmware Date 04/11/02
Fibre Level 03.01.04.75
raidutil succeeded!
Now, LUN 5 is my striped LUN.
Now to look at my disks
# /etc/raid/bin/drivutil -I c13t4d0
Group Information for a3500_lower
Group No. of RAID No. of Total Remaining
LUNs Level Drives Space(MB) Space(MB)
Hot Spare - - 2 - -
1 1 5 5 138771 0
2 1 5 5 138771 0
3 1 5 5 138771 0
4 1 5 5 138771 0
5 1 1 2 34692 0
6 1 0 4 138771 0
I have to hot spare disks, could come handy.
Raid group 6 is my striped group, contains 4 disks.
# /etc/raid/bin/drivutil -d c13t4d0
Drives in Group for a3500_lower
Group Drive List [Channel,Id]
Hot Spare [4,8]; [5,8];
Group 1: [1,0]; [2,0]; [3,0]; [4,0]; [5,0];
Group 2: [1,1]; [2,1]; [3,1]; [4,1]; [5,1];
Group 3: [1,2]; [2,2]; [3,2]; [4,2]; [5,2];
Group 4: [1,3]; [2,3]; [3,3]; [4,3]; [5,3];
Group 5: [4,9]; [5,9];
Group 6: [4,10]; [5,10]; [4,11]; [5,11];
Group 6 has those 4 disks (including the dead one), and my two hot spare disks.
First, get rid of a hot spare:
# /etc/raid/bin/raidutil -c c13t4d0 -H 48
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
LUN 5 RAID 0 138771 MB
raidutil succeeded!
Now, delete the “bad” lun 5
Delete lun 5
# /etc/raid/bin/raidutil -c c13t4d0 -D 5
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
LUN 5 RAID 0 138771 MB
Deleting LUN 5.
Press Control C to abort.
LUNs successfully deleted
Now remake my striped LUN, using the hot spare instead of the bad disk. Keeping disks in order could lower data loss:
# /etc/raid/bin/raidutil -c c13t4d0 -n 5 -l 0 -s 138771 -g 410,510,411,48
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
Capacity available in drive group: 284204032 blocks (138771 MB).
Creating LUN 5
Registering new logical unit 5 with system.
Formatting logical unit 5 RAID 0 138771 MB
Formatting logical unit 5 RAID 0 138771 MB
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
LUN 5 RAID 0 138771 MB
LUNs successfully created
raidutil succeeded!
Now for veritas
vxdiskadm, remove failed disk, replace failed disk.
Now, vxprint shows:
# vxprint -ht -g sasdg_dg
dg sas_dg default default 32000 1092403203.1591.server
dm saspool1-1 c11t4d8s2 sliced 3839 142082048 NOHOTUSE
dm saspool1-2 c13t4d5s2 sliced 4287 281001216 NOHOTUSE
v saspool1-lv - DISABLED ACTIVE 423077888 SELECT -
+fsgenpl saspool1-lv-01 saspool1-lv DISABLED RECOVER 423078976 CONCAT -
+RW
sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
sd saspool1-2-01 saspool1-lv-01 saspool1-2 0 280996928 142082048 c13t4d5 ENA
Well, the bad LUN was the second half, I could get some data back. Now to recover the plex:
# vxmend -o force off saspool1-lv-01
# vxmend on saspool1-lv-01
# vxmend fix clean saspool1-lv-01
# vxvol -g sas_dg start saspool1-lv
# vxprint -ht -g sas_dg
dg sas_dg default default 32000 1092403203.1591.server
dm saspool1-1 c11t4d8s2 sliced 3839 142082048 NOHOTUSE
dm saspool1-2 c13t4d5s2 sliced 4287 281001216 NOHOTUSE
v saspool1-lv - ENABLED ACTIVE 423077888 SELECT -
+fsgenpl saspool1-lv-01 saspool1-lv ENABLED ACTIVE 423078976 CONCAT -
+RW
sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
sd saspool1-2-01 saspool1-lv-01 saspool1-2 0 280996928 142082048 c13t4d5 ENA
The above vxmend steps work great also if you have lost a SAN disk, and brought it back. Makes a bad plex look good.
fsck moved about 60gb off to lost+found, but overall not bad seeing we could have lost much more data.