Archive for the ‘Sysadmin’ Category

Lets fire this up again

Sunday, September 23rd, 2007

Ok, lets fire this blog up again. I am going to shoot for weekly posts, but we will see how that goes.
Well, first off, I have enjoyed my summer a bit too much, and fallen off the wagon with my weight lost :-(

about 1/3 of my hard work is now gone. My hope is with the correct help and motivation over the winter, I can fix that.

Quite a bit of my summer has been just thinking of the past, and work. I had a very large project at work which was on my mind for most of the spring/summer, which was the design, costing, buildout and now support of an e-commerce environment utilizing solars 10, and solars zones. This has help save our costumer at work some money, as well as hopefully long term, save some manhours on our side.

Right as that project was coming to an end, I started being involved in another project that will go well in to the winter. A business unit in our company has been struggling quite a bit for man power as well as experience, and I was volunteered to help them out. For me this is both good and bad, I am going to California every other week, but it is allowing me to become involved with more of our company. I am learning quite a bit, and I am feeling very useful which is very, very nice. The travel is a bit much, as I am going out to California every other week, but I hope I am providing the help the folks in CA need.

Now on to the outside-work stuff. This summer In spent many weekends in my house, going to <a href=http://www.al-anon-alateen-msp.org/>meetings</a>, doing house work, or relaxing. I have finished 2.5 major projects this summer. The first was to fix my livingroom. I had a mirror wall, which was so seventies. I got rid of the evil wall, and painted the sucker, so it looks a bit better.
The second project was my stupid plumbing, which was a bit more than I was expecting, I had galvanized steel pipes throughout my house which started to rust from the inside out. Some started to drip, and most had started to close up, affecting water pressure. This spring I replaced most of the steel pipes with copper pipes. I am quite proud of myself that I managed to replace most of the plumbing without burning the house down, and I now have water pressure in the bathroom again.
Third is my patio. The patio in my back yard. The patio was built over a tree stump that had been rotting away for a good 20 years, and this cause the sucker to start to sink. This spring I pulled up the patio, this summer I had gotten rid of the stump, and with luck, this fall I will get the patio back down again. We will see if that happens :-)

This summer I continue to struggle with friendships, but on the positive side, I have re-discovered family. It was very, very hard on my loosing my sister to an internship in IL 3 years ago. The only person I had felt real close too had left, and I was all alone in this world, it seemed. This was very hard on me, and I still somewhat can not figure out to do with my sis gone. This summer I started hanging out with one of my cousins a bit more, and have truly enjoyed (and continue to enjoy) her company. She can not replace my sister, but right now it seems like someone I can trust more than some of my previous “friends”

Well, that about sums up my summer in a nutshell, with luck I will update this more than once every 3 months, if for no one else, for myself to reflect someday.

Learning Solaris

Wednesday, December 6th, 2006

This week at work we had some on-site training arranged to go over Solaris zones, Solaris SMF and patch management with Solaris 10. On the first day, Monday, we went over basic Solaris zones. How to create zones, how to configure zones, how to remove zones, etc. Solaris zones are a unique feature in Solaris, that allows for several isolated Solaris instances to be run within one physical server. It does not have the physical constraints of Solaris domains, it is described to be like BSD jails.

The next day, Tuesday, we went over backing up and restoring Solaris zones. Each zone has its own isolated  directory structure, so to back it up is very straight forward.  The next topic we went over was resource management, Each zone can have CPU shares, CPU caps and memory caps assocuated with it, to prevent one zone from taking over the entire server.

On day three we went over the solaris SMF, or Service Management Facility which has been integrated in to Solaris 10. SMF is a whole new framework that has been added to Solaris to manage the starting and restarting of daemons in Solaris, which replaces the old rc.* method of managing daemons.

The final thing we discussed was Solaris patching. With Solaris 10, they redid patching, to be more like what redhat does and whatnot, where the OS talks to a central patch server for managment. It is very cool, if it works.

Overall I found this traning to be very useful, and puts our team in to a good position to support Solaris 10 long term.

Star Wars Episode 3

Thursday, May 19th, 2005

Well, Today Netapp treated customers to a 9:00am showing of the new star wars movie this morning, I must say, the movie was not as bad as I was expecting. Don’t get me wrong, its not the best thing ever, but it did not totally suck.

They did a good job of tying things together, the fall of the Jedi was interesting. I would have liked a bit more Yoda fighting, and they tried to tie up like 6 story lines in the last 10 minutes of the movie which was annoying.

Overall, it is a 5 or 6 out of a 10.

Snow!

Sunday, May 1st, 2005

Well,. Today is May 1st in Minnesota, and I hope today we had our last snowfall. Yep, it was snowing a bit today.

Today I watched One Flew Over The Cuckoo’s Nest (1975). This is a movie I have not watched since I was at Dunwoody several years back. I forgot how good of a film it was, they did a great job with it. The items such as the electroshock therapy and the lobotomy hit me hard, I can not believe that if they did, they use to do things like that. In the extras DVD, they did shoot in a real mental hospital in Oregon, and the dean of the hospital did play one of the doctors. I find it kind of odd that they would get so much support from within the medical community, as this movie does not exactly show the best sides of things.

Yesterday afternoon I checked my results for the solaris 10 certification beta I had taken. I have passed both tests, so I am now a certified solaris administrator. I find it so odd, because I do not think that much of certs. I think that they unfairly (like education) bring folks to the top of the resume pile.
Then again, this comes from a damn good sysadmin, that does not have a four year degree.

Windows XP 64 bit

Tuesday, April 19th, 2005

Well, nice try Microsoft, but you fell short.
I was looking at windows XP 64 bit, and ran across this, which is info from Microsoft about windows XP 64 bit. Several things I find interesting:

  • Windows XP 64 bit requires 64 bit drivers. 32 bit drivers will not work
  • Using an emulation layer, you can run 32-bit applications on Windows XP 64-Bit Edition. However, such applications run significantly slower on the 64-bit system than on the 32-bit system, because emulation requires additional resources.
  • DVD video playback, CD recording, Windows media player, netbios, IPX, and several other items have been dropped.

    Nice job Microsoft, great job on developing your product. This is one area where the UNIX RISC platforms have been light years ahead of the X86 market. Sun put out its first 64 bit system in 1995 (the Ultra 1), long before it had a 64 bit OS. Solaris 2.6 released in august of 1997 could address 64 bits of memory, and solaris 7 released in October of 1998 had full 64 bit support.
    Sun did two things right:
    1. the 64 bit CPUs could still run in 32 bit mode
    2. the 64 bit operating system can naively (even still) run 32 bit applications. I would say at this point most solaris apps are still 32 bit, only the ones that need to be are 64 bit.

    The latest release of Solaris (10) is an all 64 bit kernel, but Sun gave 10 years for things to work out. They have released solaris 7, solaris 8 and solaris 9 with 32 and 64 bit kernels.

    Microsoft has no 32 bit backwards compatibility for drivers under 64 bit Windows XP. They also “emulate” 32 bit which produces quite a large
    performance impact, if the app is not recoded to 64 bit. This means for any app to be usable, the software vendors will need to ship 32 bit and 64 bit versions of the apps.
    What I find is that both the Intel and AMD cpus still will run 32 bit code. why not just use the 32 but compatibility in the CPUs, why emulate?

    These issues will overall cause the adoption of 64 bit X86 to be very slow.

  • Solaris 10 for experinced admins

    Friday, April 15th, 2005

    This week I have been in training. I have been taking the SA-225 sun training course. The session is designed to quickly get administrators that have worked with Solaris 8 and Solaris 9 up to speed on Solaris 10. For being a early access course (early in development) the class was fairly good. The instructor had done the class 4 times, so he was up to speed on what corrections needed to be integrated in. The class spent quite a bit of time on Solaris Zones as well as Dtrace, which is what I think we will be using the most of.
    This is my first Sun Education class, so I hope to get performance tuning in before I head off to lisa.

    Xsun desktop within Solaris zones

    Friday, April 15th, 2005

    Well, tonight I got my desktop working in a Solaris zone (Sun java desktop, using Xsun). My system is a ultra 60 with a creator-3d card.

    Below is how I got it working:

    First of all, on the host operating system (global zone) run ‘/usr/dt/bin/dtconfig -d’ to disable the main X server, and reboot the machine.

    Next, i made a zone:
    # zonecfg -z bluto-desktop
    bluto-desktop: No such zone configured
    Use ‘create’ to begin configuring a new zone.
    zonecfg:bluto-desktop> create
    zonecfg:bluto-desktop> set zonepath=/opt/zones/bluto-desktop # the path to the zone
    zonecfg:bluto-desktop> add net
    zonecfg:bluto-desktop:net> set physical=hme0 # my network
    zonecfg:bluto-desktop:net> set address=192.168.0.12
    zonecfg:bluto-desktop:net> end
    zonecfg:bluto-desktop> add device
    zonecfg:bluto-desktop:device> set match=/dev/mouse # mouse device
    zonecfg:bluto-desktop:device> end
    zonecfg:bluto-desktop> add device
    zonecfg:bluto-desktop:device> set match=/dev/kbd # keyboard device
    zonecfg:bluto-desktop:device> end
    zonecfg:bluto-desktop> add device
    zonecfg:bluto-desktop:device> set match=/dev/pm # power managment
    zonecfg:bluto-desktop:device> end
    zonecfg:bluto-desktop> add device
    zonecfg:bluto-desktop:device> set match=/dev/winlock # window lock device
    zonecfg:bluto-desktop:device> end
    zonecfg:bluto-desktop> add device
    zonecfg:bluto-desktop:device> set match=/dev/sound/0 # sound
    zonecfg:bluto-desktop:device> end
    zonecfg:bluto-desktop> add device
    zonecfg:bluto-desktop:device> set match=/dev/sound/0ctl # sound control
    zonecfg:bluto-desktop:device> end
    zonecfg:bluto-desktop> add device
    zonecfg:bluto-desktop:device> set match=/dev/fbs/ffb0 # framebuffer
    zonecfg:bluto-desktop:device> end
    zonecfg:bluto-desktop> verify
    zonecfg:bluto-desktop> commit
    zonecfg:bluto-desktop> exit

    Next, I install the zone

    # zoneadm -z bluto-desktop install
    Preparing to install zone .
    Creating list of files to copy from the global zone.
    Copying <2583> files to the zone.
    Initializing zone product registry.
    Determining zone package initialization order.
    Preparing to initialize <911> packages on the zone.
    Initialized <911> packages on zone.
    Zone
    is initialized.
    The file contains a log of the zone installation.

    Boot and setup the Solaris install in the zone.

    # zoneadm -z bluto-desktop boot

    Once the initial system setup is done, halt the zone.

    # zoneadm -z bluto-desktop halt or init 0 in the zone.

    Now, we need to make some “fake” devices to make the X server and sound work.

    # cd /opt/zones/bluto-desktop/dev
    # ln -s fbs/ffb0 fb
    # ln -s sound/0 audio
    # ln -s audioctl sound/0ctl

    Now, boot the zone back up
    # zoneadm -z bluto-desktop boot

    Enable DT:
    zone# /usr/dt/bin/dtconfig -e;init 6

    Once the zone is reboored, you should get the dtgreet.

    The devices above need to point to the /dev entry that points to the device. This is because the Solaris zone tool sets the /dev/whatever entry in the zone to whatever major and minor number the /device entry is for the device on the global zone. Confused? Good. All this meens is that if in the zonecfg config, a match=whatever varible is set to something that is a sym link to another file in /dev. it is not going to work.

    This breaks things like /dev/fb, which are kind of needed for Xsun and DT to work. To fix this, go to your zonepath/dev directory and make some links to resolve this:

    Issues:
    Issue number one is that once you start and stop the desktop zone, the text console of the system is no longer usable. I think this is because the keyboard device is being grabbed, even tho the tty device has it.

    All in all, this seems to work somewhat good. Often I have had to reboot my workstation because of an Xwindows issue or something. With this, I can just reboot the zone, which is much quicker. It will also allow me to limit memory, and CPU utilization.

    Solaris 10 zones info page

    Wednesday, April 13th, 2005

    Found this page (http://users.tpg.com.au/adsln4yb/zones.html) with some cool info and very cool scripts about Solaris 10, CPU and memory caps in zones, script to control the FSS, and other goodies.

    Check it out!

    How to rescue an A3500 LUN

    Sunday, April 10th, 2005

    Well, today we had a striped volume on our A3500 die. This volume, along with another volume makes up an 300gb veritas volume. One of the A3500 disks died, and it happened to be in one of these LUNs. When recovering, rm6 did not come up, so all I had was commandline. Goddie!

    First healthcheck:
    # /etc/raid/bin/healthck -a

    Health Check Summary Information

    a3500_upper: LUN - Hot Spare In Use at Drive [4,0]
    a3500_lower: Dead LUN at Drive [5,11]

    As you see, we have a dead LUN, and a hot spare. My worry is the dead LUN.

    Now to find my LUNs

    # /etc/raid/bin/raidutil -c c13t4d0 -i
    LUNs found on c13t4d0.
    LUN 0 RAID 5 138771 MB
    LUN 2 RAID 5 138771 MB
    LUN 4 RAID 1 34692 MB
    LUN 5 RAID 0 138771 MB

    Vendor ID Symbios
    ProductID StorEdgeA3500FCd
    Product Revision 0301
    Boot Level 03.01.04.00
    Boot Level Date 04/05/01
    Firmware Level 03.01.04.75
    Firmware Date 04/11/02
    Fibre Level 03.01.04.75
    raidutil succeeded!

    Now, LUN 5 is my striped LUN.

    Now to look at my disks
    # /etc/raid/bin/drivutil -I c13t4d0

    Group Information for a3500_lower

    Group No. of RAID No. of Total Remaining
    LUNs Level Drives Space(MB) Space(MB)

    Hot Spare - - 2 - -
    1 1 5 5 138771 0
    2 1 5 5 138771 0
    3 1 5 5 138771 0
    4 1 5 5 138771 0
    5 1 1 2 34692 0
    6 1 0 4 138771 0

    I have to hot spare disks, could come handy.
    Raid group 6 is my striped group, contains 4 disks.

    # /etc/raid/bin/drivutil -d c13t4d0

    Drives in Group for a3500_lower

    Group Drive List [Channel,Id]

    Hot Spare [4,8]; [5,8];
    Group 1: [1,0]; [2,0]; [3,0]; [4,0]; [5,0];
    Group 2: [1,1]; [2,1]; [3,1]; [4,1]; [5,1];
    Group 3: [1,2]; [2,2]; [3,2]; [4,2]; [5,2];
    Group 4: [1,3]; [2,3]; [3,3]; [4,3]; [5,3];
    Group 5: [4,9]; [5,9];
    Group 6: [4,10]; [5,10]; [4,11]; [5,11];

    Group 6 has those 4 disks (including the dead one), and my two hot spare disks.

    First, get rid of a hot spare:

    # /etc/raid/bin/raidutil -c c13t4d0 -H 48
    LUNs found on c13t4d0.
    LUN 0 RAID 5 138771 MB
    LUN 2 RAID 5 138771 MB
    LUN 4 RAID 1 34692 MB
    LUN 5 RAID 0 138771 MB

    raidutil succeeded!

    Now, delete the “bad” lun 5
    Delete lun 5
    # /etc/raid/bin/raidutil -c c13t4d0 -D 5
    LUNs found on c13t4d0.
    LUN 0 RAID 5 138771 MB
    LUN 2 RAID 5 138771 MB
    LUN 4 RAID 1 34692 MB
    LUN 5 RAID 0 138771 MB
    Deleting LUN 5.
    Press Control C to abort.

    LUNs successfully deleted

    Now remake my striped LUN, using the hot spare instead of the bad disk. Keeping disks in order could lower data loss:
    # /etc/raid/bin/raidutil -c c13t4d0 -n 5 -l 0 -s 138771 -g 410,510,411,48
    LUNs found on c13t4d0.
    LUN 0 RAID 5 138771 MB
    LUN 2 RAID 5 138771 MB
    LUN 4 RAID 1 34692 MB
    Capacity available in drive group: 284204032 blocks (138771 MB).
    Creating LUN 5

    Registering new logical unit 5 with system.
    Formatting logical unit 5 RAID 0 138771 MB
    Formatting logical unit 5 RAID 0 138771 MB
    LUNs found on c13t4d0.
    LUN 0 RAID 5 138771 MB
    LUN 2 RAID 5 138771 MB
    LUN 4 RAID 1 34692 MB
    LUN 5 RAID 0 138771 MB

    LUNs successfully created

    raidutil succeeded!

    Now for veritas
    vxdiskadm, remove failed disk, replace failed disk.

    Now, vxprint shows:
    # vxprint -ht -g sasdg_dg
    dg sas_dg default default 32000 1092403203.1591.server

    dm saspool1-1 c11t4d8s2 sliced 3839 142082048 NOHOTUSE
    dm saspool1-2 c13t4d5s2 sliced 4287 281001216 NOHOTUSE

    v saspool1-lv - DISABLED ACTIVE 423077888 SELECT -
    +fsgenpl saspool1-lv-01 saspool1-lv DISABLED RECOVER 423078976 CONCAT -
    +RW
    sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
    sd saspool1-2-01 saspool1-lv-01 saspool1-2 0 280996928 142082048 c13t4d5 ENA

    Well, the bad LUN was the second half, I could get some data back. Now to recover the plex:

    # vxmend -o force off saspool1-lv-01
    # vxmend on saspool1-lv-01
    # vxmend fix clean saspool1-lv-01
    # vxvol -g sas_dg start saspool1-lv
    # vxprint -ht -g sas_dg
    dg sas_dg default default 32000 1092403203.1591.server

    dm saspool1-1 c11t4d8s2 sliced 3839 142082048 NOHOTUSE
    dm saspool1-2 c13t4d5s2 sliced 4287 281001216 NOHOTUSE

    v saspool1-lv - ENABLED ACTIVE 423077888 SELECT -
    +fsgenpl saspool1-lv-01 saspool1-lv ENABLED ACTIVE 423078976 CONCAT -
    +RW
    sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
    sd saspool1-1-01 saspool1-lv-01 saspool1-1 0 142082048 0 c11t4d8 ENA
    sd saspool1-2-01 saspool1-lv-01 saspool1-2 0 280996928 142082048 c13t4d5 ENA

    The above vxmend steps work great also if you have lost a SAN disk, and brought it back. Makes a bad plex look good.

    fsck moved about 60gb off to lost+found, but overall not bad seeing we could have lost much more data.

    corruption? no corruption? story of my day

    Tuesday, March 15th, 2005

    Today was all about file corruption on a large production server. the app folks found some files that where corrupt, and we went from there. opened calls with veritas, and started looking in to things. a lesson learned was fsck. With UFS in the past, i had been able to do a fsck -n devname. I did this with vxfs, and did a fsck -F vxfs -n -o nolog,full devname. Got about 3000 errors, so i thought the filesystem was corrupt. doing some checking on a dev box, this was a symptom of the filesystem being mounted. Doh!