solaris... wtf?!?

LDOM Configuration Backup

Having Oracle VM for SPARC running is quite a nice thing, but what would happen if you lose your whole box and you have to recover all VMs and LDOM settings… I see two different things to save. The hardware configuration based on the hypervisor settings which are stored on the internal service processor. The second thing would be the virtual settings regarding vdisk or vnet services and constraints.

The worst thing is the hypervisor partitioning setup which is not saved and transferred to the service processor by default/automatic. This ends in a factory-default (or an older saved config) setup if you power off the server and forgot to save it. Had that several times when hardware was changed and after powering on all settings were gone… nice situation 😉

Per default you have the ldmd autorecovery_policy set to 1, which means that warnings are logged into the SMF log file and you have to manually perform any configuration recovery. To be honest, I know no one who monitors the SMF log files… so perhaps an idea to include this file(s)

root@t7primary02:~# svccfg -s ldmd listprop ldmd/autorecovery_policy
 ldmd/autorecovery_policy integer     1
 root@t7primary02:~#
 root@t7primary02:~# grep save /var/svc/log/ldoms-ldmd:default.log
 May 30 16:26:30 warning: Autosave config 'foo01' is newer than SP config

Setting it to 2 would display a notification message if an autosave configuration is newer. But still you must manually perform any configuration recovery.

root@t7primary02:~# ldm list
Autorecovery Policy is NOTIFY. Autosave configuration, 'foo01', is
newer than the configuration stored in the SP. Use either the ldm  add-spconfig -r <cfg> or ldm add-spconfig -r <cfg> <newcfg> command to save the autosave configuration to the SP, and power cycle the system for the updated SP configuration to take effect. The requested command was aborted, reissue your command to perform the operation you intended. No more autosave warning messages will be issued.

With autorecovery_policy = 3 the configuration is updated automatically and would overwrite the SP configuration that will be used during the next powercycle. But the documentation says “thus, you must use the ldm add-spconfig –r command to manually update an existing configuration or create a new one based on the autosave data”

root@t7primary02:~# tail -2 /var/svc/log/ldoms-ldmd:default.log
May 30 16:48:47 Autosave config 'foo02' recovered to SP config
A powercycle is required for the updated SP config to take effect
root@t7primary02:~# ldm list-config
factory-default
foo02
foo01 [current]
root@t7primary02:~# ldm list-config -r
foo01
foo02 [newer]

Anyhow, I would always recommend using your old school crontab to look for new configuration and save it…

This mini script would look for new config, save it and rotate the last 5 entries as a backup.

#!/usr/bin/bash
DATE=`/usr/bin/date +%y%m%d-%H:%M`
ldm list-config | grep "\[current\]" > /dev/null || ldm add-config $DATE-auto
lines=`ldm list-config | grep "\-auto" | wc -l`
if [[ $lines -gt 5 ]]
then
     del=`expr $lines - 5`
       ldm list-config | grep "\-auto" | grep -v "\[current\]" | head -$del | \
               while read todel
               do
                       ldm remove-config $todel
               done
fi

I tried to find a technical reason why this configurations are not stored in an automatic way but to be honest, I have no clue…

Virtual Backup

To save my LDOM settings I use the following:

# ldm list-constraints –x
# ldm list-bindings –e

And mentioned in the Installation Guide you should backup the AutoSave directory “/var/opt/SUNWldm”.

When using SR-IOV for your SAN HBAs always use own WWNs. The random WWNs assigned by the ldom manager cannot be restored. When trying to set the WWN to the old value the manager will say, no, because it is in his own random WWN pool…

I am using this script to backup my WWNs for the VF using SR-IOV.

echo "ldm start-reconf primary"
ldm ls primary | nawk '/^primary/ {printf("ldm set-memory %s primary\nldm set-core %s primary\n", $6, $5/8)}'
# Find HBAs of IO Domains
ldm ls-io | nawk '!/primary|IOVFC\.PF[0-9]|^NAME|^--/ {printf("ldm set-io iov=on %s\n", $3); printf("ldm rm-io %s primary\n", $1)}'
# SR-IOV: Find VF configuration
ldm ls-io | nawk '/IOVFC\.PF[0-9]\.VF[0-9]/ {print $3}' | sort -u | nawk '{printf("ldm set-io iov=on %s\n", $1)}'
ldm ls-io | nawk 'BEGIN {FS="\."} /IOVFC\.PF[0-9]\.VF[0-9]/ {printf("%s.%s\n", $1, $2)}' | sort -u | nawk '{printf("ldm create-vf -n max %s\n", $1)}'
ldm ls-io -l | nawk '/IOVFC\.PF[0-9]\.VF[0-9]/ {vf=$1; getline; getline; getline; portwwn=$3; getline; nodewwn=$3; printf("ldm set-io port-wwn=%s node-wwn=%s %s\n", portwwn, nodewwn, vf)}'

As always… you can never have too many backups 😉

ZFSSA LUN ID mapping

We had some issues with iSCSI LUN IDs and multi pathing on Linux / OVM mixing up LUN IDs so clients could not load their devices anymore from a ZFS Storage Appliance with two storage pools… putting all LUNs in a pool into own target groups per ZS controller seems to work. We think the problem was based on LUN 0 which has to be seen by the initiators when scanning or connecting to the iSCSI targets.

Based on this issue I want to see if the LUN is in a group and which LUN ID was generated. Clicking around in the BUI or selecting every LUN on CLI takes ages and is quite uncomfortable. So I wrote this little script to get these values… The script language of the Oracle ZFS Storage Appliance is based on JavaScript (ECMAScript version 3) with a few extensions.

# ssh root@zfssa < lun-map.aksh |  sort --version-sort
Pseudo-terminal will not be allocated because stdin is not a terminal.
0     OVM             clusterpool     PERFpool         zfssa01_tgt_grp
1     OVM             store_zfs02     CAPpool          zfssa02_tgt_grp
2     OVM             store_zfs01     PERFpool         zfssa01_tgt_grp
4     CLUSTER01       dg_cluster01    PERFpool         zfssa01_tgt_grp
6     CLUSTER01       dg_cluster02    PERFpool         zfssa01_tgt_grp
8     CLUSTER01       dg_cluster03    PERFpool         zfssa01_tgt_grp
10    CLUSTER01       dg_data01       PERFpool         zfssa01_tgt_grp
12    CLUSTER01       dg_data02       PERFpool         zfssa01_tgt_grp
14    CLUSTER01       dg_data03       PERFpool         zfssa01_tgt_grp
16    CLUSTER01       dg_data04       PERFpool         zfssa01_tgt_grp
18    CLUSTER01       dg_fra01        PERFpool         zfssa01_tgt_grp
20    CLUSTER01       dg_fra02        PERFpool         zfssa01_tgt_grp
22    FRA             DG_DATA01       PERFpool         zfssa01_tgt_grp
24    FRA             DG_DATA02       PERFpool         zfssa01_tgt_grp
26    CLUSTER02       dg_cluster01    PERFpool         zfssa01_tgt_grp
28    CLUSTER02       dg_cluster02    PERFpool         zfssa01_tgt_grp
30    CLUSTER02       dg_cluster03    PERFpool         zfssa01_tgt_grp
32    CLUSTER02       dg_data01       PERFpool         zfssa01_tgt_grp
34    CLUSTER02       dg_data02       PERFpool         zfssa01_tgt_grp
36    CLUSTER02       dg_data03       PERFpool         zfssa01_tgt_grp
38    CLUSTER02       dg_data04       PERFpool         zfssa01_tgt_grp
40    CLUSTER02       dg_fra01        PERFpool         zfssa01_tgt_grp
42    CLUSTER02       dg_fra02        PERFpool         zfssa01_tgt_grp
44    FRA             DG_DATA03       PERFpool         zfssa01_tgt_grp
46    FRA             DG_DATA04       PERFpool         zfssa01_tgt_grp
48    FRA             DG_FRA01        PERFpool         zfssa01_tgt_grp
50    FRA             DG_FRA02        PERFpool         zfssa01_tgt_grp
56    CLUSTER03       dg_cluster01    PERFpool         zfssa01_tgt_grp
58    CLUSTER03       dg_cluster02    PERFpool         zfssa01_tgt_grp
60    CLUSTER03       dg_cluster03    PERFpool         zfssa01_tgt_grp
62    CLUSTER03       dg_data01       PERFpool         zfssa01_tgt_grp
64    CLUSTER03       dg_data02       PERFpool         zfssa01_tgt_grp
66    CLUSTER03       dg_data03       PERFpool         zfssa01_tgt_grp
68    CLUSTER03       dg_data04       PERFpool         zfssa01_tgt_grp
70    CLUSTER03       dg_fra01        PERFpool         zfssa01_tgt_grp
72    CLUSTER03       dg_fra02        PERFpool         zfssa01_tgt_grp

As you can see deleting LUNs and creating new does not use free numbers again. There is also a known bug that OVM cannot see more than 44 LUNs (MOS DocID 1932633.1)

My CODE:

script
/* small script to get LUN IDs for ZFSSA iSCSI volumes
*/
/* let's get some VARs to work with... we need to know how many pools are configured
*/
function getPools() {
        var poolChoices, p, poolList = new Array(), singlePool, err;
        run('cd /');
        run('configuration storage');
        try {
                poolChoices = choices('pool');
                if (poolChoices.length > 0) {
                        singlePool = false;
                } else {
                        singlePool = true;
                }
        } catch (err) {
                        singlePool = true;
        }
        if (singlePool) {
                poolList[0] = new Object();
                poolList[0].pool = get('pool');
                poolList[0].status = get('status');
                try {
                        poolList[0].owner = get('owner');
                } catch (err) {
                        /* poolList[0].owner = 'current'; */
                        poolList[0].owner = get('pool');
                }
        } else {
                for (p = 0; p < poolChoices.length; p += 1) {
                        try {
                                run('set pool=' + poolChoices[p]);
                                poolList[p] = new Object();
                                poolList[p].pool = get('pool');
                                poolList[p].status = get('status');
                                try {
                                        poolList[p].owner = get('owner');
                                } catch (err) {
                                        poolList[p].owner = 'current';
                                }
                                } catch (err) {
                                        continue;
                                }
                        }
                }
                return (poolList);
}
/* Now we can use the pool(s) and look for LUNs
*/
allPools = getPools();
run('cd /');
run('shares');
for (p = 0; p < allPools.length; p += 1) {
        try {
                run('set pool=' +  allPools[p].pool);
        } catch (err) {
                continue;
        }
projects = list();
/* in multiple projects could be LUNs
*/
        for (i = 0; i < projects.length; i++) {
                try {
                        run('select ' + projects[i]);
                } catch(err) {
                        continue;
                }
                luns = list();
                for (j = 0; j < luns.length; j++) {
                                run('select ' + luns[j]);
/* but it could also be a share, so ignore the error if no ID could be found
*/
                        try {
                                printf("%-5s %-15s %-15s %-15s %-15s\n",
                                get('assignednumber'), projects[i], luns[j], allPools[p].pool, get('targetgroup'));
                        } catch(err) {
                                run('cd ..');
                                continue;
                        }
                        run('cd ..');
                }
        run('cd ..');
        }
}

If you want to store this code on the appliance you just have to replace “script” by

var workflow = {
       name: 'LUN ID discovery',
       description: 'Displays the current LUN IDs',
       execute: function () {
CODE COMES HERE
}
};

Then you upload the file as a workflor and you will find it in the BUI and on CLI. Workflows are used to store scripts in the Oracle ZFS Storage Appliance and they also offer user access management for scripts and argument validation functionality.

zfssa:> maintenance workflows show
Properties:
                    showhidden = false

Workflows:

WORKFLOW     NAME                       OWNER SETID ORIGIN               VERSION
workflow-000 LUN ID discovery           root  false <local>              undefined
workflow-001 Clear locks                root  false Oracle Corporation   1.0.0
workflow-002 Configure for Oracle Solaris Cluster NFS root  false Oracle Corporation   1.0.0
workflow-003 Unconfigure Oracle Solaris Cluster NFS root  false Oracle Corporation   1.0.0
workflow-004 Configure for Oracle Enterprise Manager Monitoring root  false Sun Microsystems, Inc. 1.1
workflow-005 Unconfigure Oracle Enterprise Manager Monitoring root  false Sun Microsystems, Inc. 1.0

zfssa:>
zfssa:> maintenance workflows select workflow-000 run 
zfssa:> maintenance workflow-000 run (uncommitted)> commit
Workflow output:
  1     OVM             store_zfs02     CAPpool          zfssa02_tgt_grp
  4     CLUSTER01       dg_cluster01    PERFpool         zfssa01_tgt_grp
  6     CLUSTER01       dg_cluster02    PERFpool         zfssa01_tgt_grp
  8     CLUSTER01       dg_cluster03    PERFpool         zfssa01_tgt_grp
  10    CLUSTER01       dg_data01       PERFpool         zfssa01_tgt_grp
  12    CLUSTER01       dg_data02       PERFpool         zfssa01_tgt_grp
  14    CLUSTER01       dg_data03       PERFpool         zfssa01_tgt_grp
  16    CLUSTER01       dg_data04       PERFpool         zfssa01_tgt_grp
  18    CLUSTER01       dg_fra01        PERFpool         zfssa01_tgt_grp
  20    CLUSTER01       dg_fra02        PERFpool         zfssa01_tgt_grp
  26    CLUSTER02       dg_cluster01    PERFpool         zfssa01_tgt_grp
  28    CLUSTER02       dg_cluster02    PERFpool         zfssa01_tgt_grp
  30    CLUSTER02       dg_cluster03    PERFpool         zfssa01_tgt_grp
  32    CLUSTER02       dg_data01       PERFpool         zfssa01_tgt_grp
  34    CLUSTER02       dg_data02       PERFpool         zfssa01_tgt_grp
  36    CLUSTER02       dg_data03       PERFpool         zfssa01_tgt_grp
  38    CLUSTER02       dg_data04       PERFpool         zfssa01_tgt_grp
  40    CLUSTER02       dg_fra01        PERFpool         zfssa01_tgt_grp
  42    CLUSTER02       dg_fra02        PERFpool         zfssa01_tgt_grp
  56    CLUSTER03       dg_cluster01    PERFpool         zfssa01_tgt_grp
  58    CLUSTER03       dg_cluster02    PERFpool         zfssa01_tgt_grp
  60    CLUSTER03       dg_cluster03    PERFpool         zfssa01_tgt_grp
  62    CLUSTER03       dg_data01       PERFpool         zfssa01_tgt_grp
  64    CLUSTER03       dg_data02       PERFpool         zfssa01_tgt_grp
  66    CLUSTER03       dg_data03       PERFpool         zfssa01_tgt_grp
  68    CLUSTER03       dg_data04       PERFpool         zfssa01_tgt_grp
  70    CLUSTER03       dg_fra01        PERFpool         zfssa01_tgt_grp
  72    CLUSTER03       dg_fra02        PERFpool         zfssa01_tgt_grp
  22    FRA             DG_DATA01       PERFpool         zfssa01_tgt_grp
  24    FRA             DG_DATA02       PERFpool         zfssa01_tgt_grp
  44    FRA             DG_DATA03       PERFpool         zfssa01_tgt_grp
  46    FRA             DG_DATA04       PERFpool         zfssa01_tgt_grp
  48    FRA             DG_FRA01        PERFpool         zfssa01_tgt_grp
  50    FRA             DG_FRA02        PERFpool         zfssa01_tgt_grp
  0     OVM             clusterpool     PERFpool         zfssa01_tgt_grp
  2     OVM             store_zfs01     PERFpool         zfssa01_tgt_grp
zfssa:>

Oracle Linux on SPARC

Last week there were news about Linux on SPARC. Oracle announced their Oracle Linux 6.7 as GA with the Oracle Enterprise Unbreakable Linux Kernel for SPARC T5 and T7 based servers.

This release supports the build in SPARC features like secure memory, the DAX engines and the crypto co processers on Oracle’s own CPUs. You could install it bare metal or in an OVM / LDOM guest. It also comes with support for the LDOM manager so you could use it as a primary domain.

I gave it try on my old T2 server but the installation failed installing grub on that old platform. I tried to manually install grub2 but got a lot of errors with the sun labed vdisk.

So I tried it on a T7-2 server in a LDOM and there everything worked fine.

[root@linux0 ~]# cat /etc/oracle-release
Oracle Linux Server release 6.7
[root@linux0 ~]#
[root@linux0 ~]# uname -a
Linux linux0 2.6.39-500.1.76.el6uek.sparc64 #1 SMP Fri Dec 16 10:47:54 EST 2016 sparc64 sparc64 sparc64 GNU/Linux

[root@linux0 ~]# cat /proc/cpuinfo
cpu             : SPARC-M7
fpu             : SPARC-M7 integrated FPU
pmu             : sparc-m7
prom            : OBP 4.40.1 2016/04/25 06:45
type            : sun4v
ncpus probed    : 16
ncpus active    : 16
D$ parity tl1   : 0
I$ parity tl1   : 0
cpucaps         : flush,stbar,swap,muldiv,v9,blkinit,n2,mul32,div32,v8plus,popc,vis,vis2,ASIBlkInit,fmaf,vis3,hpc,ima,pause,cbcond,adp,aes,des,camellia,md5,sha1,sha256,sha512,mpmul,montmul,montsqr,crc32c
Cpu0ClkTck      : 00000000f65c15b0
Cpu1ClkTck      : 00000000f65c15b0
Cpu2ClkTck      : 00000000f65c15b0
Cpu3ClkTck      : 00000000f65c15b0
Cpu4ClkTck      : 00000000f65c15b0
Cpu5ClkTck      : 00000000f65c15b0
Cpu6ClkTck      : 00000000f65c15b0
Cpu7ClkTck      : 00000000f65c15b0
Cpu8ClkTck      : 00000000f65c15b0
Cpu9ClkTck      : 00000000f65c15b0
Cpu10ClkTck     : 00000000f65c15b0
Cpu11ClkTck     : 00000000f65c15b0
Cpu12ClkTck     : 00000000f65c15b0
Cpu13ClkTck     : 00000000f65c15b0
Cpu14ClkTck     : 00000000f65c15b0
Cpu15ClkTck     : 00000000f65c15b0
MMU Type        : Hypervisor (sun4v)
MMU PGSZs       : 8K,64K,4MB,256MB,2GB,16GB
State:
CPU0:           online
CPU1:           online
CPU2:           online
CPU3:           online
CPU4:           online
CPU5:           online
CPU6:           online
CPU7:           online
CPU8:           online
CPU9:           online
CPU10:          online
CPU11:          online
CPU12:          online
CPU13:          online
CPU14:          online
CPU15:          online
[root@linux0 ~]#
[root@linux0 /]# lscpu
Architecture:          sparc64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Big Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    8
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
L0 cache:              16384
L1i cache:             16384
L2 cache:              262144
L3 cache:              8388608
NUMA node0 CPU(s):     0-15
[root@linux0 /]#

What did not work was the dynamic reconfiguration:

root@primary:~# ldm set-core 4 linux0
Domain linux0 is unable to dynamically reconfigure VCPUs. Please
verify the guest operating system is running and supports VCPU DR.
root@primary:~# ldm set-memory 6g linux0
The linux0 domain does not support the dynamic reconfiguration of memory.
root@primary:~#

I am not really sure why Oracle released “such an old version”. No UEK 3 or 4, just the 2.6, but at least the latest 2.6.39… I am running a beta SPARC linux from Oracle which was avaiable since 2015 and within this release you get 4.1. But the system uses silo (a SPARC lilo boot loader) not grub2 like OL 6.7 for SPARC.

[root@linux1 ~]# cat /etc/redhat-release
Linux for SPARC release 1.0
[root@linux1 ~]# uname -a
Linux linux1 4.1.12-32.el6uek.sparc64 #1 SMP Thu Dec 17 19:27:27 PST 2015 sparc64 sparc64 sparc64 GNU/Linux
[root@linux1 ~]#


[root@linux1 ~]# cat /proc/cpuinfo
cpu             : UltraSparc T2 (Niagara2)
fpu             : UltraSparc T2 integrated FPU
pmu             : niagara2
prom            : OBP 4.30.8.a 2010/05/13 10:36
type            : sun4v
ncpus probed    : 8
ncpus active    : 8
D$ parity tl1   : 0
I$ parity tl1   : 0
cpucaps         : flush,stbar,swap,muldiv,v9,blkinit,n2,mul32,div32,v8plus,popc,vis,vis2,ASIBlkInit
Cpu0ClkTck      : 00000000457646c0
Cpu1ClkTck      : 00000000457646c0
Cpu2ClkTck      : 00000000457646c0
Cpu3ClkTck      : 00000000457646c0
Cpu4ClkTck      : 00000000457646c0
Cpu5ClkTck      : 00000000457646c0
Cpu6ClkTck      : 00000000457646c0
Cpu7ClkTck      : 00000000457646c0
MMU Type        : Hypervisor (sun4v)
MMU PGSZs       : 8K,64K,4MB,256MB
State:
CPU0:           online
CPU1:           online
CPU2:           online
CPU3:           online
CPU4:           online
CPU5:           online
CPU6:           online
CPU7:           online
[root@linux1 ~]#
[root@linux1 ~]# lscpu
Architecture:
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Big Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    4
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
NUMA node0 CPU(s):     0-63
[root@linux1 ~]#

So let’s see what will happen. I am still not sure if I should like it or not 🙂