Archive for the ‘Work’ Category

DR test - day one

Thursday, June 23rd, 2005

Well, as I thought, today was a long day. Today we got the backup and recovery environment up and running with very little problems overall, but doing that tied us up for some time, so we ran late with some of the system configuration tasks. We pounded through those outstanding tasks, and restores had started by evening time. The only outstanding issue so far is tape drive configuration for one media server, we will see if that gets resolved by morning.

DR test - day zero

Tuesday, June 21st, 2005

Today is travel day for our disaster recovery test. Got a mid-afternoon flight and made it to Philly by early evening. Getting checked in the hotel was quite an experince, as the hotel had a new computer system they had just started using. How fun.

The DR vendor we use was nice enough to treat us all to a nice dinner, a nice italian meal. It was nice to get a good meal in before the test. Tomorrow is going to be a long day….

Winding down

Monday, June 20th, 2005

Well, today things had started to wind down at work, I did not want to have that much going on today. Got home, did some more house cleaning, and got dinner cooked up.

It was very nice to be able to spend a day to just wind down and relax a bit at work.

Getting ready for DR

Sunday, June 5th, 2005

Every year, we do a disaster recovery for work, and this years test is just starting to wind up. We have been preparing for about 6 months, but the last months things get down to the wire, and it becomes very crazy. For the most part this year I have stayed out of the mess, and kind of done stuff in the background, to hand the torch over to other folks.

Star Wars Episode 3

Thursday, May 19th, 2005

Well, Today Netapp treated customers to a 9:00am showing of the new star wars movie this morning, I must say, the movie was not as bad as I was expecting. Don’t get me wrong, its not the best thing ever, but it did not totally suck.

They did a good job of tying things together, the fall of the Jedi was interesting. I would have liked a bit more Yoda fighting, and they tried to tie up like 6 story lines in the last 10 minutes of the movie which was annoying.

Overall, it is a 5 or 6 out of a 10.

quiet busy week, quiet weekend

Sunday, April 24th, 2005

Last week at work was very busy. I am juggling about 5 projects, and keeping them all in the air is getting to be very interesting.

Last week I did my first automated Linux install. I am impressed with how simple a kickstart install is. I just went through a manual install, then that generated a kickstart config file. Total of 6 linux boxes I need to build, so it will be very interesting.

Unfortunately this weekend was to cold for grilling, but I hope next weekend is warmer.

Not much else going on :-(

Solaris 10 for experinced admins

Friday, April 15th, 2005

This week I have been in training. I have been taking the SA-225 sun training course. The session is designed to quickly get administrators that have worked with Solaris 8 and Solaris 9 up to speed on Solaris 10. For being a early access course (early in development) the class was fairly good. The instructor had done the class 4 times, so he was up to speed on what corrections needed to be integrated in. The class spent quite a bit of time on Solaris Zones as well as Dtrace, which is what I think we will be using the most of.
This is my first Sun Education class, so I hope to get performance tuning in before I head off to lisa.

How to rescue an A3500 LUN

Sunday, April 10th, 2005

Well, today we had a striped volume on our A3500 die. This volume, along with another volume makes up an 300gb veritas volume. One of the A3500 disks died, and it happened to be in one of these LUNs. When recovering, rm6 did not come up, so all I had was commandline. Goddie!

First healthcheck:
# /etc/raid/bin/healthck -a

Health Check Summary Information

a3500_upper: LUN - Hot Spare In Use at Drive [4,0]
a3500_lower: Dead LUN at Drive [5,11]

As you see, we have a dead LUN, and a hot spare. My worry is the dead LUN.

Now to find my LUNs

# /etc/raid/bin/raidutil -c c13t4d0 -i
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
LUN 5 RAID 0 138771 MB

Vendor ID Symbios
ProductID StorEdgeA3500FCd
Product Revision 0301
Boot Level 03.01.04.00
Boot Level Date 04/05/01
Firmware Level 03.01.04.75
Firmware Date 04/11/02
Fibre Level 03.01.04.75
raidutil succeeded!

Now, LUN 5 is my striped LUN.

Now to look at my disks
# /etc/raid/bin/drivutil -I c13t4d0

Group Information for a3500_lower

Group No. of RAID No. of Total Remaining
LUNs Level Drives Space(MB) Space(MB)

Hot Spare - - 2 - -
1 1 5 5 138771 0
2 1 5 5 138771 0
3 1 5 5 138771 0
4 1 5 5 138771 0
5 1 1 2 34692 0
6 1 0 4 138771 0

I have to hot spare disks, could come handy.
Raid group 6 is my striped group, contains 4 disks.

# /etc/raid/bin/drivutil -d c13t4d0

Drives in Group for a3500_lower

Group Drive List [Channel,Id]

Hot Spare [4,8]; [5,8];
Group 1: [1,0]; [2,0]; [3,0]; [4,0]; [5,0];
Group 2: [1,1]; [2,1]; [3,1]; [4,1]; [5,1];
Group 3: [1,2]; [2,2]; [3,2]; [4,2]; [5,2];
Group 4: [1,3]; [2,3]; [3,3]; [4,3]; [5,3];
Group 5: [4,9]; [5,9];
Group 6: [4,10]; [5,10]; [4,11]; [5,11];

Group 6 has those 4 disks (including the dead one), and my two hot spare disks.

First, get rid of a hot spare:

# /etc/raid/bin/raidutil -c c13t4d0 -H 48
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
LUN 5 RAID 0 138771 MB

raidutil succeeded!

Now, delete the “bad” lun 5
Delete lun 5
# /etc/raid/bin/raidutil -c c13t4d0 -D 5
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
LUN 5 RAID 0 138771 MB
Deleting LUN 5.
Press Control C to abort.

LUNs successfully deleted

Now remake my striped LUN, using the hot spare instead of the bad disk. Keeping disks in order could lower data loss:
# /etc/raid/bin/raidutil -c c13t4d0 -n 5 -l 0 -s 138771 -g 410,510,411,48
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
Capacity available in drive group: 284204032 blocks (138771 MB).
Creating LUN 5

Registering new logical unit 5 with system.
Formatting logical unit 5 RAID 0 138771 MB
Formatting logical unit 5 RAID 0 138771 MB
LUNs found on c13t4d0.
LUN 0 RAID 5 138771 MB
LUN 2 RAID 5 138771 MB
LUN 4 RAID 1 34692 MB
LUN 5 RAID 0 138771 MB

LUNs successfully created

raidutil succeeded!

Now for veritas
vxdiskadm, remove failed disk, replace failed disk.

Now, vxprint shows:
# vxprint -ht -g sasdg_dg
dg sas_dg default default 32000 1092403203.1591.server

dm saspool1-1 c11t4d8s2 sliced 3839 142082048 NOHOTUSE
dm saspool1-2 c13t4d5s2 sliced 4287 281001216 NOHOTUSE

v saspool1-lv - DISABLED ACTIVE 423077888 SELECT -
+fsgenpl saspool1-lv-01 saspool1-lv DISABLED RECOVER 423078976 CONCAT -
+RW
sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
sd saspool1-2-01 saspool1-lv-01 saspool1-2 0 280996928 142082048 c13t4d5 ENA

Well, the bad LUN was the second half, I could get some data back. Now to recover the plex:

# vxmend -o force off saspool1-lv-01
# vxmend on saspool1-lv-01
# vxmend fix clean saspool1-lv-01
# vxvol -g sas_dg start saspool1-lv
# vxprint -ht -g sas_dg
dg sas_dg default default 32000 1092403203.1591.server

dm saspool1-1 c11t4d8s2 sliced 3839 142082048 NOHOTUSE
dm saspool1-2 c13t4d5s2 sliced 4287 281001216 NOHOTUSE

v saspool1-lv - ENABLED ACTIVE 423077888 SELECT -
+fsgenpl saspool1-lv-01 saspool1-lv ENABLED ACTIVE 423078976 CONCAT -
+RW
sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
sd saspool1-2-01 saspool1-lv-01 saspool1-2 0 280996928 142082048 c13t4d5 ENA

The above vxmend steps work great also if you have lost a SAN disk, and brought it back. Makes a bad plex look good.

fsck moved about 60gb off to lost+found, but overall not bad seeing we could have lost much more data.

Server made it!

Friday, March 11th, 2005

Well, the server in question made it to IL, and with one tweak, I had access to the server. Makes me feel so good I can deliver on a timeline I did not think would ever happen.

Server got installed, storage got allocated, and a database will be laid down on the system over the weekend.

On a unrelated subject, we got some real nice weather this afternoon. The ground was just below freezing, and we got some nice fluffy, but moist snow. As the snow was compacted down, made a solid sheet of ice on the freeway. Fun!

Work, Fun with friends

Thursday, March 10th, 2005

Well,
This weeks goal was to have a server up and running in IL 48 hours after it left MN. The server made it down to IL, but I can not talk to it. The fact that all the parts are in place, is just amazing.

Tonight was a good work recovery session. In addition to my three friends doing the normal Thursday night, a 4th friend and his wife made it over for the evening. It was very nice spending the evening together.