Solaris Cluster Update

Had the fun today to upgrade a bunch of solaris clusters…. this time I wrote a docu 🙂

Cluster Update

let's update our cluster running a flying zone:

On both nodes:

 # scinstall -u update -b 11.3.26.0.5.0 -L accept 

[...]

disable the zone-resource

 # clrs disable name-zone-res 

reboot second cluster node where the RG is not running into the new BE after coming back, switch the RG to the updated node

 clrg switch -n <node> name-zone-rg 

Now set the BE UUID from the first node to the second

 first# /opt/SUNWsczone/sczbt/util/ha-solaris-zone-boot-env-id get
 second# /opt/SUNWsczone/sczbt/util/ha-solaris-zone-boot-env-id set <uuid>

Having the storage-res/RG on the second node and the right UUID you can attach the zone:

 # zoneadm -z <name> attach -x destroy-orphan-zbes 

Check if the zone comes up with a normal state (boot and svcs -xv). If ok, shutdown the zone again.

Now we can give the control back to the cluster

 # clrs enable name-zone-res
 # clrg resume name-zone-rg 

Now you can try to switch back...

And you could also upgrade your resource type version


# clrt list
SUNW.LogicalHostname:5
SUNW.SharedAddress:3
SUNW.HAStoragePlus:11
ORCL.ha-zone_sczbt:2
# clrt register ORCL.ha-zone_sczbt
# clrt register SUNW.HAStoragePlus
# clrt list
SUNW.LogicalHostname:5
SUNW.SharedAddress:3
SUNW.HAStoragePlus:11
ORCL.ha-zone_sczbt:2
ORCL.ha-zone_sczbt:4
SUNW.HAStoragePlus:12
# clrs list
pressy-zone-rs
pressy-zp-rs
# clrs show -p Type_version pressy-zone-rs
=== Resources ===
Resource:                                       pressy-zone-rs
  Type_version:                                    2
  --- Standard and extension properties ---
# /opt/SUNWsczone/sczbt/util/rt_upgrade +
Migration of resource:pressy-zone-rs to latest resource type version succeeded.
# clrs show -p Type_version pressy-zone-rs
=== Resources ===
Resource:                                       pressy-zone-rs
  Type_version:                                    4
  --- Standard and extension properties ---
#
# clrs show -p Type_version pressy-zp-rs
=== Resources ===
Resource:                                       pressy-zp-rs
  Type_version:                                    11
  --- Standard and extension properties ---
# clrs set -p Type_version=12 pressy-zp-rs
# clrt unregister SUNW.HAStoragePlus:11
# clrt unregister ORCL.ha-zone_sczbt:2
#

Oracle Solaris & SPARC @ OOW17

There were a lot of rumors about Solaris and SPARC because Oracle fired around 1.500 HW developers some months ago. Around Oracle Open World were some product announcements and more information about Solaris. A brand-new SPARC chip came out and Oracle promised once again to invest into Solaris and give a Solaris 11 support at least until 2034.

They also mentioned the next Solaris 11.next release will come in fall 2017. So it will take some time to get 11.4 but hopefully with the  backport of Solaris 12 beta features since they canceled 12 in favor of 11.next.

Oracle SPARC M8 Processor

The new SPARC CPU has 32 cores again but at 5 GHz based on a new 20nm core design with “Software in Silicon v2”.  We have now four execution pipelines, they doubled the size of L1 cache and increased the performance of the DAX engines (1.8 vs. 2.2 GHz). M8 comes with 180 GB/s memory bandwidth measured, and a raw memory links performance of 374 GB/s per processor. (BTW; did you know that the latest Xeon 8180M launched at Q3’17 only provides 119 GB/s). We got Oracle number acceleration units and SHA-3 was added to the other 15 cyphers and hashes we had in M7 cryptography co-processors.

Compared to M7 that is a 1.5x better single-thread performance with 21% higher frequency and on the memory site 16% higher bandwidth but with still 6% lower memory latency. That’s a very nice and easy improvement for your Solaris installations. And do not forget, just migrate your LDOM live to the new M8 server and enjoy the faster environment…

Oracle says:

SPARC M8 processors running Oracle Solaris 11.3 ran 2.9 times faster executing AES-CFB 256-bit key encryption (in cache) than the Intel Xeon Processor Platinum 8168 (with AES-NI) running Oracle Linux 7.3.

SPARC M8 processors running Oracle Solaris 11.3 ran 6.4 times faster executing AES-CFB 256-bit key encryption (in cache) than the Intel Xeon Processor E5-2699 v4 (with AES-NI) running Oracle Linux 7.2

Oracle Fujitsu SPARC XII

Also Fujitsu announced a new Sparc64 processor called SPARC XII which can be bought as OEM from Oracle. A 12 core CPU @ 4,25 GHz in systems with 1 to 32 sockets and up to 32 TB memory.

 

 

Oracle X7 Server

With the latest x86 generation you will also get Support and the RTU for Solaris-x86. Oracle X7 Server comes with 1 up to 8 Sockets with 192 cores and 6 TB memory.

Overall great news for your Solaris investments and big datacenter irons 🙂

 

 

 

Solaris FMD logs

How to clear fmadm log or FMA faults log

(based on FMA Cheat Sheet (Doc ID 1355350.1))

Had some issues in the last days getting my "fmadm faulty" empty... standard commands did not work

# fmadm repair <UUID>
# fmadm clear <UUID>
# fmadm acquit <UUID>

I had to delete FMDs "database" files:

# svcadm disable -s svc:/system/fmd:default
# cd /var/fm/fmd
# find /var/fm/fmd -type f -exec ls {} \;
# find /var/fm/fmd -type f -exec rm {} \;
# svcadm enable svc:/system/fmd:default
# fmadm reset cpumem-diagnosis
# fmadm reset cpumem-retire
# fmadm reset eft
# fmadm reset io-retire
# fmadm faulty
<empty output again>

SPARC Console

Primary Domain logs to SP

-> set /SP/console line_count=1000
-> show /SP/console/history

Console I/O from all other domains is redirected �to virtual console concentrator (vcc)

# tail /var/log/vntsd/<ldom>/console-log[.<num>]

Console logging is enabled by default

# ldm list -o console ldomsc0
NAME
ldomsc0

VCONS
    NAME         SERVICE                PORT   LOGGING
    ldomsc0      primary-vcc@primary    5002   on

Domain must be in an inactive and unbound state before changing

# ldm set-vcons log=off <ldom>

!! Older than Oracle Solaris 11.1 cannot be logged !!

LDOM Configuration Backup

Having Oracle VM for SPARC running is quite a nice thing, but what would happen if you lose your whole box and you have to recover all VMs and LDOM settings… I see two different things to save. The hardware configuration based on the hypervisor settings which are stored on the internal service processor. The second thing would be the virtual settings regarding vdisk or vnet services and constraints.

The worst thing is the hypervisor partitioning setup which is not saved and transferred to the service processor by default/automatic. This ends in a factory-default (or an older saved config) setup if you power off the server and forgot to save it. Had that several times when hardware was changed and after powering on all settings were gone… nice situation 😉

Per default you have the ldmd autorecovery_policy set to 1, which means that warnings are logged into the SMF log file and you have to manually perform any configuration recovery. To be honest, I know no one who monitors the SMF log files… so perhaps an idea to include this file(s)

root@t7primary02:~# svccfg -s ldmd listprop ldmd/autorecovery_policy
 ldmd/autorecovery_policy integer     1
 root@t7primary02:~#
 root@t7primary02:~# grep save /var/svc/log/ldoms-ldmd:default.log
 May 30 16:26:30 warning: Autosave config 'foo01' is newer than SP config

Setting it to 2 would display a notification message if an autosave configuration is newer. But still you must manually perform any configuration recovery.

root@t7primary02:~# ldm list
Autorecovery Policy is NOTIFY. Autosave configuration, 'foo01', is
newer than the configuration stored in the SP. Use either the ldm  add-spconfig -r <cfg> or ldm add-spconfig -r <cfg> <newcfg> command to save the autosave configuration to the SP, and power cycle the system for the updated SP configuration to take effect. The requested command was aborted, reissue your command to perform the operation you intended. No more autosave warning messages will be issued.

With autorecovery_policy = 3 the configuration is updated automatically and would overwrite the SP configuration that will be used during the next powercycle. But the documentation says “thus, you must use the ldm add-spconfig –r command to manually update an existing configuration or create a new one based on the autosave data”

root@t7primary02:~# tail -2 /var/svc/log/ldoms-ldmd:default.log
May 30 16:48:47 Autosave config 'foo02' recovered to SP config
A powercycle is required for the updated SP config to take effect
root@t7primary02:~# ldm list-config
factory-default
foo02
foo01 [current]
root@t7primary02:~# ldm list-config -r
foo01
foo02 [newer]

 

Anyhow, I would always recommend using your old school crontab to look for new configuration and save it…

This mini script would look for new config, save it and rotate the last 5 entries as a backup.

#!/usr/bin/bash
DATE=`/usr/bin/date +%y%m%d-%H:%M`
ldm list-config | grep "\[current\]" > /dev/null || ldm add-config $DATE-auto
lines=`ldm list-config | grep "\-auto" | wc -l`
if [[ $lines -gt 5 ]]
then
     del=`expr $lines - 5`
       ldm list-config | grep "\-auto" | grep -v "\[current\]" | head -$del | \
               while read todel
               do
                       ldm remove-config $todel
               done
fi

 

I tried to find a technical reason why this configurations are not stored in an automatic way but to be honest, I have no clue…

Virtual Backup

To save my LDOM settings I use the following:

# ldm list-constraints –x
# ldm list-bindings –e

And mentioned in the Installation Guide you should backup the AutoSave directory “/var/opt/SUNWldm”.

When using SR-IOV for your SAN HBAs always use own WWNs. The random WWNs assigned by the ldom manager cannot be restored. When trying to set the WWN to the old value the manager will say, no, because it is in his own random WWN pool…

I am using this script to backup my WWNs for the VF using SR-IOV.

echo "ldm start-reconf primary"
ldm ls primary | nawk '/^primary/ {printf("ldm set-memory %s primary\nldm set-core %s primary\n", $6, $5/8)}'
# Find HBAs of IO Domains
ldm ls-io | nawk '!/primary|IOVFC\.PF[0-9]|^NAME|^--/ {printf("ldm set-io iov=on %s\n", $3); printf("ldm rm-io %s primary\n", $1)}'
# SR-IOV: Find VF configuration
ldm ls-io | nawk '/IOVFC\.PF[0-9]\.VF[0-9]/ {print $3}' | sort -u | nawk '{printf("ldm set-io iov=on %s\n", $1)}'
ldm ls-io | nawk 'BEGIN {FS="\."} /IOVFC\.PF[0-9]\.VF[0-9]/ {printf("%s.%s\n", $1, $2)}' | sort -u | nawk '{printf("ldm create-vf -n max %s\n", $1)}'
ldm ls-io -l | nawk '/IOVFC\.PF[0-9]\.VF[0-9]/ {vf=$1; getline; getline; getline; portwwn=$3; getline; nodewwn=$3; printf("ldm set-io port-wwn=%s node-wwn=%s %s\n", portwwn, nodewwn, vf)}'

As always… you can never have too many backups 😉