Solaris Resource Controls – Oracle DB project

Last time I had to explain what this solaris project for oracle DBs means per line and why it is set. The better question was – how can you see the usage if you have to increase the limit – again, per entry… I tried to write it together and want to share my findings. If you have any more input I am open to share it.

Looking into that Oracle database project, running on a SuperCluster Solaris 11.4

user.oracle
        projid : 105
        comment: ""
        users  : (none)
        groups : (none)
        attribs: process.max-core-size=(privileged,1073741824,deny)
                 process.max-file-descriptor=(basic,65536,deny)
                 process.max-sem-nsems=(privileged,2048,deny)
                 process.max-sem-ops=(privileged,1024,deny)
                 process.max-stack-size=(basic,33554432,deny)
                 project.max-msg-ids=(privileged,4096,deny)
                 project.max-sem-ids=(privileged,65535,deny)
                 project.max-shm-ids=(privileged,4096,deny)
                 project.max-shm-memory=(privileged,2199023255552,deny)

Let's try to look into these settings what they mean, defaults, Sol10vs11, Oracle RDBMS docu, OCS recommendations and usage:

max-file-descriptor

process.max-file-descriptor       Maximum number of open files per process
OLD = rlim_fd_max rlim_fd_cur

Oracle RDBMS Installation Minimum Value = soft 1024 / hard 65536
Solaris 10 Default = basic 256
Solaris 11.4 Default = basic 256
OSC Setting = basic 65536

CHECK setting

root #  prctl -n process.max-file-descriptor -i process $$
process: 21663: -bash
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
process.max-file-descriptor
        basic             256       -   deny                                 -
        privileged      65.5K       -   deny                                 -
        system          2.15G     max   deny                                 -
root # ulimit -n
256
root #

CHECK usage

root #  echo ::kmastat | mdb -k | grep file_cache
file_cache                     72    26298    36848     2695168B 95753345327     0
root #

In this example, 26298 is the number of file descriptors in use and 36848 the number of allocated file descriptors. Note that in Solaris, there is no maximum open file descriptors setting for the system, only a single process might have open. They are allocated on demand as long as there is free RAM available.

max-sem-nsems

process.max-sem-nsems             Maximum number of semaphoren
OLD = seminfo_semmsl

Oracle RDBMS Minimum Value = 256
Solaris 10 Default = 25
Solaris 11.4 Default = 512
OSC Setting = 2048

CHECK setting

# prctl -n process.max-sem-nsems -i process $$
oracle:~$ prctl -n process.max-sem-nsems -i process $$
process: 22738: -bash
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
process.max-sem-nsems
        privileged      2.05K       -   deny                                 -
        system          32.8K     max   deny                                 -
oracle:~$

CHECK how many NSEMS are used:

# ipcs -sb

max-sem-ops

 
process.max-sem-ops          Maximum number of System V semaphore operations
OLD = seminfo_semopm

Oracle RDBMS Installation Minimum Value = N/A
Solaris 10 = 10
Solaris 11.4 Default = 512
OSC Setting = 1024

CHECK ? Good question, could not find how to check the current usage; documentation says that the application should get errorsr; eturn code of E2BIG from a semop() call...

max-stack-size

process.max-stack-size      Maximum stack memory segment available to this process.
OLD = "combination of different kernel settings let to it or ulimit command (eg.: lwp_default_stksize, rlim_fd_cur)"
Oracle RDBMS Installation Minimum Value = soft 10240 / hard 32768
Solaris 10 Default = 8192
Solaris 11.4 Default = 8192
OSC Setting = 33554432

CHECK settings

oracle:~$ ulimit -s
32768
oracle:~$ prctl -n process.max-stack-size -i process $$
process: 24517: -bash
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
process.max-stack-size
        basic           32.0MB      -   deny                                 -
        privileged      8.00EB    max   deny                                 -
        system          8.00EB    max   deny                                 -
oracle-@C01SC1C0L01Z01A:~$ ulimit -sS

CHECK per process That could be logged by:

# rctladm -e syslog process.max-stack-size

Could be looked up by:

# pmap -sx PID

OR - found a dtrace script (Doc ID 2275236.1):

# dtrace -qn '
grow_internal:entry{self->trace=1;self->addr=arg0}
grow_internal:return/self->trace && arg1==12/
{printf("pid:%d %s stack_addr:%a fault_addr:%a\n",
pid,execname,curthread->t_procp->p_usrstack,self->addr);
self->trace=0;self->addr=0;}
grow_internal:return/self->trace/{self->trace=0;self->addr=0}'

The below example shows the process caused a pagefault at 0x0. The size of stack growth is the different between the fault address and the current stack address, which is 0xffc00000 - 0x0 ~= 3.99GB while the process.max-stack-size is 8MB.

# dtrace <... snip for brevity...>
pid:6672 fwrxmldiff stack_addr:0xffc00000 fault_addr:0x0

# tail -1 /var/adm/messages
Jun  9 06:56:26 hostname genunix: [ID 500092 kern.notice] basic rctl process.max-stack-size (value 8388608) exceeded by process 6672.

max-msg-ids

project.max-msg-ids       maximum number of message queues that can be created
OLD = msgsys:msginfo_msgmni
Oracle RDBMS Installation Minimum Value = 100
Solaris 10 Default = 50
Solaris 11.4 Default = 128
OSC Setting = 4096

CHECK: check the number of active message queues

# ipcs -q 

Seen Errors

 Failure to create message queue .. msgget: No space left on device


max-sem-ids

project.max-sem-ids       Maximum number of semaphore identifiers
OLD =  seminfo_semmni
Oracle RDBMS Installation Minimum Value = 100
Solaris 10 Default = 10
Solaris 11.4 Default = 128
OSC Setting = 65535

CHECK: You can see the identifier for the facility entry using ipcs -s; i.m.a.o. the current usage should be seen with:

# ipcs -sZ | grep -c ^s

max-shm-ids

project.max-shm-ids     limit on number of shared memory segments that can be created
OLD = shminfo_shmmni
Oracle RDBMS Installation Minimum Value = 100
Solaris 10 Default = 100
Solaris 11.4 Default = 128
OSC Setting = 4096

CHECK

# ipcs -b
# ipcs -bZ | grep -c ^m

max-core-size

process.max-core-size Maximum size of a core file that is created by this process. Default is unlimited!

max-shm-memory

Last but not least the maximum shared memory itself. Default is 1/4 of physical Memory in Solaris 11.4 - i guess the best to see the usage is mdb and its OSM section (optimized shared memory):

# echo "::memstat" | mdb -k
Usage Type/Subtype                      Pages    Bytes  %Tot  %Tot/%Subt
---------------------------- ---------------- -------- ----- -----------
Kernel                               11919848    90.9g  7.4%
  Regular Kernel                     10099567    77.0g        6.3%/84.7%
  Defdump prealloc                    1820281    13.8g        1.1%/15.2%
ZFS                                   2043159    15.5g  1.2%
User/Anon                           141962499     1.0t 89.2%
  Regular User/Anon                  28686595   218.8g       18.0%/20.2%
  OSM                               113275904   864.2g       71.2%/79.7%
Exec and libs                          327706     2.5g  0.2%
Page Cache                             109770   857.5m  0.0%
Free (cachelist)                         9321    72.8m  0.0%
Free                                  2618033    19.9g  1.6%
Total                               158990336     1.1t  100%
#

BTW; never forget – you can always use “rctladm” to enable syslog for a lot of limits.

# rctladm -l
process.max-cpu-time syslog=off [ lowerable no-deny cpu-time inf seconds ]
process.max-file-size syslog=off [ lowerable deny file-size bytes ]
process.max-data-size syslog=off [ lowerable deny no-signal bytes ]
process.max-stack-size syslog=off [ lowerable deny no-signal bytes ]
process.max-core-size syslog=off [ lowerable deny no-signal bytes ]
process.max-file-descriptor syslog=off [ lowerable deny count ]
process.max-address-space syslog=off [ lowerable deny no-signal bytes ]
process.max-sem-nsems syslog=off [ deny count ]
process.max-sem-ops syslog=off [ deny count ]
process.max-msg-qbytes syslog=off [ deny bytes ]
process.max-msg-messages syslog=off [ deny count ]
process.max-port-events syslog=off [ deny count ]
process.max-itimers syslog=off [ deny count ]
process.max-sigqueue-size syslog=off [ lowerable deny count ]
process.max-deferred-posts syslog=off [ lowerable deny count ]
task.max-lwps syslog=off [ count ]
task.max-processes syslog=off [ count ]
task.max-cpu-time syslog=off [ no-deny cpu-time no-obs inf seconds ]
project.cpu-shares syslog=n/a [ no-basic no-deny no-signal no-syslog count ]
project.cpu-cap syslog=n/a [ no-basic deny no-signal inf no-syslog count ]
project.max-lwps syslog=off [ no-basic count ]
project.max-processes syslog=off [ no-basic count ]
project.max-tasks syslog=off [ no-basic count ]
project.max-sem-ids syslog=off [ no-basic deny count ]
project.max-msg-ids syslog=off [ no-basic deny count ]
project.max-shm-ids syslog=off [ no-basic deny count ]
project.max-shm-memory syslog=off [ no-basic deny bytes ]
project.max-mrp-ids syslog=off [ no-basic deny count ]
project.max-port-ids syslog=warning [ no-basic deny count ]
project.max-locked-memory syslog=off [ no-basic deny bytes ]
project.max-adi-metadata-memory syslog=off [ no-basic deny bytes ]
project.max-contracts syslog=off [ no-basic deny count ]
zone.cpu-shares syslog=n/a [ no-basic no-deny no-signal no-syslog count ]
zone.cpu-cap syslog=n/a [ no-basic deny no-signal inf no-syslog count ]
zone.max-lwps syslog=off [ no-basic count ]
zone.max-processes syslog=off [ no-basic count ]
zone.max-msg-ids syslog=off [ no-basic deny count ]
zone.max-sem-ids syslog=off [ no-basic deny count ]
zone.max-shm-ids syslog=off [ no-basic deny count ]
zone.max-shm-memory syslog=off [ no-basic deny bytes ]
zone.max-mrp-ids syslog=off [ no-basic deny count ]
zone.max-locked-memory syslog=off [ no-basic deny bytes ]
zone.max-adi-metadata-memory syslog=off [ no-basic deny bytes ]
zone.max-swap syslog=off [ no-basic deny bytes ]
zone.max-lofi syslog=off [ no-basic deny count ]
#

Solaris 11.3 Support End is near!

In May 2020 Oracle has announced the planned end of Solaris 11.3 Premier Support with October 2020 and thus postponed the end of July 2020 mentioned last year.
With the release of Solaris 11.4 in September 2018, the main stream of development was shifted to the latest version and many features and bug fixes were developed exclusively for the 11.4 release. At that time, Oracle also announced that many older systems would no longer be supported on Solaris 11.4. This affected the following systems in particular:

– Mx000 SPARC Enterprise Server with SPARC64 VI, VII or VII+ CPUs
– All systems with UltraSPARC T1, T2, T2+ and T3 CPUs
– Many old SunFire / Oracle x86 servers of the Vx0z, X2xx00, X4xxx0 or the X6xx0 & X8xx0 blade modules
– And all Netra servers of the above mentioned series (NEBS certification and ETSI compliance)

old and dusty T4 found @cu 😉

Many of these SPARC servers are still running in the customer environment and Oracle has listened to the community’s outcry at that time and provided so-called LSUs (Limited Support Updates) for Solaris 11.3. With the seventh LSU (11.3.6.20.0 from April 14, 2020) a last LSU could be released until October. After that, it seems that there will be no more fixes for 11.3 and only the continuous release model of Solaris 11.4 will be invested in. Although Oracle supports in the maintenance contract in the sense of infinite “Sustaining Support”, they will not offer stability or security patches for 11.3.

As a result, a mandatory upgrade to Solaris 11.4 will only work as described above if the servers use at least Oracle SPARC T4 or SPARC64 X CPUs.

Many customers are very reluctant to upgrade to 11.4 because Oracle has included many new features in the fourth version. But meanwhile I can absolutely recommend the upgrade without a guilty conscience. Many of my customers have been stable on 11.4 for many months and appreciate the features and the usual stability of their Solaris environments. No matter if we talk about single servers or SuperCluster implementations.

Happy Upgrading!!!

Oracle Support Document 2382427.1 (Oracle Solaris 11.3 Support) can be found at: https://support.oracle.com/epmos/faces/DocumentDisplay?id=2382427.1

Oracle Support Document 2433413.1 (Oracle Solaris 11.3 Limited Support Updates (LSU) Index) can be found at: https://support.oracle.com/epmos/faces/DocumentDisplay?id=2433413.1

 

ZFSSA 2 EXA


ZFS Storage Appliance Infiniband

To attach your ZFSSA using infiniband to an EXA for example you might want more than one "virtual" datalink on your HCA using multiple IB partitions with same pkey. It is (yet?) not possible to allow the use of the same pkey when creating two IB partitions (datalinks) that point to the same IB device using the BUI and the CLI property is hidden; don't know why; but I could find a workaround in: Oracle Support Document 2087231.1 (Guidelines When Using ZFS Storage in an Exadata Environment) https://support.oracle.com/epmos/faces/DocumentDisplay?id=2087231.1

In my example I will create a bunch of datalinks to enable an active-active IPMP failover configuration for both controllers. (you will have to start with ibpart1, "0" does not work)

zsexa0101a:configuration net datalinks>partition
zsexa0101a:configuration net datalinks partition (uncommitted)> set li    <-- tab tab
linkmode  links
zsexa0101a:configuration net datalinks partition (uncommitted)> set link  <-- tab tab
linkmode  links       <-- it is not there ;-| 
zsexa0101a:configuration net datalinks partition (uncommitted)> set linkname=ibpart5
                      linkname = ibpart5 (uncommitted)
zsexa0101a:configuration net datalinks partition (uncommitted)> show
Properties:
                         class = partition
                         label = Untitled Datalink
                         links = (unset)
                          pkey = (unset)
                      linkmode = cm
 
zsexa0101a:configuration net datalinks partition (uncommitted)> set links=ibp0
                         links = ibp0 (uncommitted)
zsexa0101a:configuration net datalinks partition (uncommitted)> set pkey=ffff
                          pkey = ffff (uncommitted)
zsexa0101a:configuration net datalinks partition (uncommitted)> show
Properties:
                         class = partition
                         label = Untitled Datalink
                         links = ibp0 (uncommitted)
                          pkey = ffff (uncommitted)
                      linkmode = cm
 
zsexa0101a:configuration net datalinks partition (uncommitted)> commit
zsexa0101a:configuration net datalinks> show
Datalinks:
 
DATALINK       CLASS       LINKS       STATE   ID      LABEL
aggr1          aggregation i40e0       up      -        zsexa01-LACP
                           i40e4
ibpart1        partition   ibp0        up      -        zsexa01-IB0
ibpart2        partition   ibp2        up      -        zsexa01-IB1
ibpart3        partition   ibp0        up      -        zsexa01-IB0
ibpart4        partition   ibp2        up      -        zsexa01-IB1
ibpart5        partition   ibp0        up      -       Untitled Datalink
igb0           device      igb0        up      -       Motherboard-igb0
pffff_ibp0     partition   ibp0        up      -        zsexa01-IB0
pffff_ibp2     partition   ibp2        up      -        zsexa01-IB1
vnic1          vnic        igb0        up      -        zsexa0101a-VNIC
vnic2          vnic        igb0        up      -        zsexa0102a-VNIC
vnic3          vnic        aggr1       up      -        zsexa0101c-VNIC
vnic4          vnic        aggr1       up      -        zsexa0102c-VNIC