DAX usage on OSC?

We enabled the Oracle Database InMemory option on an Oracle SuperCluster M7 and wanted to see if our SPARC DAX engines are used. With newer Solaris releases you could use daxstat to get some more info next to busstat or dtrace but issuing daxstat brought an error:

root@OSC:~# daxstat
Traceback (most recent call last):
 File "/usr/bin/daxstat", line 969, in <module>
 sys.exit(main())
 File "/usr/bin/daxstat", line 962, in main
 return process_opts()
 File "/usr/bin/daxstat", line 905, in process_opts
 dax_ids, dax_queue_ids = derive_dax_opts(args, parser)
 File "/usr/bin/daxstat", line 844, in derive_dax_opts
 dax_ids = find_ids(query, parser, None)
 File "/usr/bin/daxstat", line 683, in find_ids
 all_dax_kstats = RCU.list_objects(kbind.Kstat(), query)
 File "/usr/lib/python2.7/vendor-packages/rad/connect.py", line 391, in list_objects
 a RADInterface object
 File "/usr/lib/python2.7/vendor-packages/rad/client.py", line 213, in _raise_error
 packer.pack_int((timestamp % 1000000) * 1000)
rad.client.NotFoundError: Error listing com.oracle.solaris.rad.kstat:type=Kstat: not found (3)

In my installation the following was not installed:

root@OSC:~# pkg list -a | grep kstat
library/python-2/python-kstat                     5.11-0.175.2.0.0.27.0      --o
system/management/rad/module/rad-kstat            0.5.11-0.175.3.17.0.1.0    ---


root@OSC:~# pkg install system/management/rad/module/rad-kstat
[...]
root@OSC:~# pkg list | grep rad
system/management/rad                             0.5.11-0.175.3.21.0.4.0    i--
system/management/rad/client/rad-c                0.5.11-0.175.3.21.0.3.0    i--
system/management/rad/client/rad-python           0.5.11-0.175.3.17.0.1.0    i--
system/management/rad/module/rad-kstat            0.5.11-0.175.3.17.0.1.0    i--
system/management/rad/module/rad-smf              0.5.11-0.175.3.17.0.1.0    i--
root@OSC:~#
root@OSC:~# svcadm disable svc:/system/rad:local svc:/system/rad:local-http
root@OSC:~# svcadm enable svc:/system/rad:local svc:/system/rad:local-http

And yes, that was it 😉

root@OSC:~# daxstat -ad 60
DAX    commands fallbacks    input    output %busy
ALL    32541246     15222     1.0G     78.0M     0
ALL        5760         0     6.0M      0.0M     0
ALL        2240         0     6.0M      0.0M     0
root@OSC:~# daxstat 1 1
DAX    commands fallbacks    input    output %busy
  0        7062         0     0.0M      0.0M     0
  1        7071         0     0.0M      0.0M     0
  2        7071         0     0.0M      0.0M     0
  3        7067         0     0.0M      0.0M     0
  4        7073         0     0.0M      0.0M     0
  5        7071         0     0.0M      0.0M     0
  6        7066         0     0.0M      0.0M     0
  7        7067         0     0.0M      0.0M     0
  8     4078650      1878     0.0M      0.0M     0
  9     4078651      1941     0.0M      0.0M     0
 10     4078699      1870     0.0M      0.0M     0
 11     4078674      1914     0.0M      0.0M     0
 12     4078720      1923     0.0M      0.0M     0
 13     4078723      1929     0.0M      0.0M     0
 14     4078706      1897     0.0M      0.0M     0
 15     4078721      1871     0.0M      0.0M     0
 16        5696         0     0.0M      0.0M     0
 17        5705         0     0.0M      0.0M     0
 18        5704         0     0.0M      0.0M     0
 19        5704         0     0.0M      0.0M     0
 20        5702         0     0.0M      0.0M     0
 21        5703         0     0.0M      0.0M     0
 22        5700         0     0.0M      0.0M     0
 23        5702         0     0.0M      0.0M     0


As you can see, the dax engines are working…

In this example I rearranged some cores to the IM-zone to get more DAX pipelines and could use DAX engines from more than one chip, because we are also using more memory than one socket owns (around 1.2 TB). I read that “DAX units and pipelines are not hardwired to certains cores and you can submit work to any DAX unit on a CPU” in this article which explains DAX very good. So I thought it makes sense to spread cores for the zone around sockets. After changing the core pinning from one socket to three sockest I saw three times 8 units in use. That could be the reason why only the middle is more busy than the rest… might be a NUMA effect, I will test to repopulate the IM store to see if it spreads the load soon…

 

SPARC Roadmap 2018

A new roadmap is on the web… Solaris 11.5 as Solaris.next and new Sparc M8+ chips are planned for 2020/21…

Good new for the best operating system on the world 😉

http://www.oracle.com/us/products/servers-storage/servers/sparc/oracle-sparc/sparc-roadmap-slide-2076743.pdf

Solaris Dashboard – First Look

Oh… I am really looking forward to getting more information about that new Solaris 11.4 dashboard and internal stats command. Just a quick view with:

# svcadm enable webui/server

gives you a nice BUI to click around at https://<IP:6787> with some drilldowns and nice graphs… (btw; that the old sun management console port 🙂 )

not there what fishwork’s ZFSSA BUI can do, but I hope some of that analytic drilldowns will also be possible on native Solaris soon.

Quick screener from my installation, but well… right now the ldom does nothing…

BUI

This might be a good  “single pane of glass” view to get a health report without any additional frameworks like nagios or similar external tools. I am really interested how this will look and feel on a production server…

On the first look, you could do a lot… but I need a server who is doing something to really click around… will have to find some testing stuff 🙂

A lot to customize if you want that…

 

Meltdown Spectre SPARC Solaris?!?

In the last couple of weeks the whole IT was shocked getting news about security issues based on basic CPU architecture design covering “all” processor vendors… Spectre (CVE-2017-5753 and CVE-2017-5715) and Meltdown (CVE-2017-5754) vulnerabilities affected XEONs, AMD, POWER and ARM chips, but also SPARC uses similar features like all others. Spectre and Meltdown are different variants of the same fundamental underlying vulnerability, if exploited, allow attackers to get access to data previously considered completely protected. That affects chips manufactured in the last 20 years. Speculative execution to predict the future and out of order execution were implemented by Sun on their SPARC T3 chips (eg.: T3-1; GA November 2010, LOD September 2012).

@Meltdown;

Solaris never had kernel pages mapped in user context on SPARC. That’s the reason why I do not think Meltdown affects SPARC at all. Also Oracle mentioned “Oracle believes that Oracle Solaris versions running on SPARCv9 hardware are not impacted by the Meltdown”. But it could happen on Solaris x86 as far as I understood.

@Spectre;

Well, starting with T3 (S3 core) Oracle introduced speculative and out-of-order execution in the S3 pipeline. Prediction algorithms and deepness of the prediction buffers differs not only between vendors but across CPU generations.

[UPDATE 14.04.2018]
There are public patches from Oracle to fix that issue, MOS document for Solaris / SPARC:

Oracle Support Document 2349278.1 (Oracle Solaris on SPARC and Spectre (CVE-2017-5753 and CVE-2017-5715) and Meltdown (CVE-2017-5754)) can be found at: https://support.oracle.com/epmos/faces/DocumentDisplay?id=2349278.1

All other Oracle products could be found at:

Oracle Support Document 2347948.1 (Addendum to the January 2018 CPU Advisory for Spectre and Meltdown) can be found at: https://support.oracle.com/epmos/faces/DocumentDisplay?id=2347948.1

Still cannot say anything about possible performance issues…

[Update]
Oracle published a new MOS article about the impact:
Oracle Support Document 2386271.1 (Performance impact of technical mitigation measure against vulnerability CVE-2017-5715 (Spectre v2) on SPARC Servers)

Like on other architectures 2-10% … heard some very bad news from customers using older Intel boxes with up to 70% IO loss… real world examples will be interesting…