Solaris FMD logs

How to clear fmadm log or FMA faults log

(based on FMA Cheat Sheet (Doc ID 1355350.1))

Had some issues in the last days getting my "fmadm faulty" empty... standard commands did not work

# fmadm repair <UUID>
# fmadm clear <UUID>
# fmadm acquit <UUID>

I had to delete FMDs "database" files:

# svcadm disable -s svc:/system/fmd:default
# cd /var/fm/fmd
# find /var/fm/fmd -type f -exec ls {} \;
# find /var/fm/fmd -type f -exec rm {} \;
# svcadm enable svc:/system/fmd:default
# fmadm reset cpumem-diagnosis
# fmadm reset cpumem-retire
# fmadm reset eft
# fmadm reset io-retire
# fmadm faulty
<empty output again>

SPARC Console

Primary Domain logs to SP

-> set /SP/console line_count=1000
-> show /SP/console/history

Console I/O from all other domains is redirected �to virtual console concentrator (vcc)

# tail /var/log/vntsd/<ldom>/console-log[.<num>]

Console logging is enabled by default

# ldm list -o console ldomsc0
NAME
ldomsc0

VCONS
    NAME         SERVICE                PORT   LOGGING
    ldomsc0      primary-vcc@primary    5002   on

Domain must be in an inactive and unbound state before changing

# ldm set-vcons log=off <ldom>

!! Older than Oracle Solaris 11.1 cannot be logged !!

LDOM Configuration Backup

Having Oracle VM for SPARC running is quite a nice thing, but what would happen if you lose your whole box and you have to recover all VMs and LDOM settings… I see two different things to save. The hardware configuration based on the hypervisor settings which are stored on the internal service processor. The second thing would be the virtual settings regarding vdisk or vnet services and constraints.

The worst thing is the hypervisor partitioning setup which is not saved and transferred to the service processor by default/automatic. This ends in a factory-default (or an older saved config) setup if you power off the server and forgot to save it. Had that several times when hardware was changed and after powering on all settings were gone… nice situation 😉

Per default you have the ldmd autorecovery_policy set to 1, which means that warnings are logged into the SMF log file and you have to manually perform any configuration recovery. To be honest, I know no one who monitors the SMF log files… so perhaps an idea to include this file(s)

root@t7primary02:~# svccfg -s ldmd listprop ldmd/autorecovery_policy
 ldmd/autorecovery_policy integer     1
 root@t7primary02:~#
 root@t7primary02:~# grep save /var/svc/log/ldoms-ldmd:default.log
 May 30 16:26:30 warning: Autosave config 'foo01' is newer than SP config

Setting it to 2 would display a notification message if an autosave configuration is newer. But still you must manually perform any configuration recovery.

root@t7primary02:~# ldm list
Autorecovery Policy is NOTIFY. Autosave configuration, 'foo01', is
newer than the configuration stored in the SP. Use either the ldm  add-spconfig -r <cfg> or ldm add-spconfig -r <cfg> <newcfg> command to save the autosave configuration to the SP, and power cycle the system for the updated SP configuration to take effect. The requested command was aborted, reissue your command to perform the operation you intended. No more autosave warning messages will be issued.

With autorecovery_policy = 3 the configuration is updated automatically and would overwrite the SP configuration that will be used during the next powercycle. But the documentation says “thus, you must use the ldm add-spconfig –r command to manually update an existing configuration or create a new one based on the autosave data”

root@t7primary02:~# tail -2 /var/svc/log/ldoms-ldmd:default.log
May 30 16:48:47 Autosave config 'foo02' recovered to SP config
A powercycle is required for the updated SP config to take effect
root@t7primary02:~# ldm list-config
factory-default
foo02
foo01 [current]
root@t7primary02:~# ldm list-config -r
foo01
foo02 [newer]

 

Anyhow, I would always recommend using your old school crontab to look for new configuration and save it…

This mini script would look for new config, save it and rotate the last 5 entries as a backup.

#!/usr/bin/bash
DATE=`/usr/bin/date +%y%m%d-%H:%M`
ldm list-config | grep "\[current\]" > /dev/null || ldm add-config $DATE-auto
lines=`ldm list-config | grep "\-auto" | wc -l`
if [[ $lines -gt 5 ]]
then
     del=`expr $lines - 5`
       ldm list-config | grep "\-auto" | grep -v "\[current\]" | head -$del | \
               while read todel
               do
                       ldm remove-config $todel
               done
fi

 

I tried to find a technical reason why this configurations are not stored in an automatic way but to be honest, I have no clue…

Virtual Backup

To save my LDOM settings I use the following:

# ldm list-constraints –x
# ldm list-bindings –e

And mentioned in the Installation Guide you should backup the AutoSave directory “/var/opt/SUNWldm”.

When using SR-IOV for your SAN HBAs always use own WWNs. The random WWNs assigned by the ldom manager cannot be restored. When trying to set the WWN to the old value the manager will say, no, because it is in his own random WWN pool…

I am using this script to backup my WWNs for the VF using SR-IOV.

echo "ldm start-reconf primary"
ldm ls primary | nawk '/^primary/ {printf("ldm set-memory %s primary\nldm set-core %s primary\n", $6, $5/8)}'
# Find HBAs of IO Domains
ldm ls-io | nawk '!/primary|IOVFC\.PF[0-9]|^NAME|^--/ {printf("ldm set-io iov=on %s\n", $3); printf("ldm rm-io %s primary\n", $1)}'
# SR-IOV: Find VF configuration
ldm ls-io | nawk '/IOVFC\.PF[0-9]\.VF[0-9]/ {print $3}' | sort -u | nawk '{printf("ldm set-io iov=on %s\n", $1)}'
ldm ls-io | nawk 'BEGIN {FS="\."} /IOVFC\.PF[0-9]\.VF[0-9]/ {printf("%s.%s\n", $1, $2)}' | sort -u | nawk '{printf("ldm create-vf -n max %s\n", $1)}'
ldm ls-io -l | nawk '/IOVFC\.PF[0-9]\.VF[0-9]/ {vf=$1; getline; getline; getline; portwwn=$3; getline; nodewwn=$3; printf("ldm set-io port-wwn=%s node-wwn=%s %s\n", portwwn, nodewwn, vf)}'

As always… you can never have too many backups 😉