RAMBleed on Solaris/SPARC?

Using enterprise class hardware shows once again that you get what you paid for.

A new issue came up in the last days describing attacks against Dynamic Random Access Memory (DRAM) modules that are already susceptible to Rowhammer-style attacks. At the end of the day these security problems are not microprocessor-specific, they leverage known issues in DRAM memory. These attacks only impact DDR4 and DDR3 memory modules, and older generations DDR2 and DDR1 memory modules are not vulnerable to these attacks.
Oracle published an advisory that reported that older and current servers using its SPARC and x86 CPUs aren’t expected to be susceptible to RAMBleed:

RAMBleed

All current and many older families of Oracle x86 (X5, X6, X7, X8, E1) and Oracle SPARC servers (S7, T7, T8, M7, M8) employing DDR4 DIMMs are not expected to be impacted by RAMBleed. This is because Oracle only employs DDR4 DIMMs that have implemented the Target Row Refresh (TRR) defense mechanism against RowHammer. Oracle’s memory suppliers have stated that these implementations have been designed to be effective against RowHammer.

Just to remember once again… 😉

Meltdown@SPARC – NO
Spectre@SPARC – OKnot all variants (HW_BTI & OS fixed)
Foreshadow@SPARC – NO
ZombieLoad@SPARC – NO
RAMBleed@SPARC – OKvery difficult based on TRR DIMMs

Working in an enterprise data center and securing your business critical services might be easier than you thought… work with Solaris on SPARC!

No RISC no fun

SPARC Roadmap 2018

A new roadmap is on the web… Solaris 11.5 as Solaris.next and new Sparc M8+ chips are planned for 2020/21…

Good new for the best operating system on the world 😉

http://www.oracle.com/us/products/servers-storage/servers/sparc/oracle-sparc/sparc-roadmap-slide-2076743.pdf

[UPDATE – 06/2019]

As you might already recognized the official roadmap disappeared from oracle-web… my last discussion with the SPARC product management was “well, no one else publish a roadmap”… I am not happy with that answer, but as it seems Oralce does not say there will be a M8+ right now. Also the naming scheme for 11.5 is in discussion, again regarding re-certification as an argument, so we might see more 11.4 updates rather a 11.5 release. But they promised again that Solaris will be supported at least until 2034 from today and will be developed to meet all necessary features to be the most stable and secure OS for the next decades.

But even if Oracle will not bring a M8+ chip we still have a public roadmap from our Fujitsu friends who build very nice SPARC systems:

https://www.fujitsu.com/global/products/computing/servers/unix/sparc/key-reports/roadmap/

Oracle M8 chip is after 18 months still the fastet processor to handle JAVA workload (see spec.org) and phenomenal fast in database analytic in-memory queries (up to 10 times faster than x86) with their DAX engines.

Fujitsu CPUs compared to M8 does not provide so many cores on the chip but have a better single thread performance than Mx chips.

To finalize it is still a very good idea if you need a highly secure and robust environment to go for SPARC systems. “Endless” OS support (still 15 years, no one else promise that), a great platform supported for the next years and still a roadmap from Fujitsu to reinvest if necessary.

No RISC no fun!

Solaris SMF and FMA Notifications

I never realized that there is a really easy way to “monitor” your Solaris in a way to use build-in SMF and FMA monitors and send a mail in case of diagnoses.

# svccfg setnotify problem-diagnosed mailto:pressy@solaris.wtf
# svcadm enable http:apache24
# mv /etc/apache2/2.4/httpd.conf /etc/apache2/2.4/httpd.conf_bu
# pkill httpd
# svcs -xv
svc:/network/http:apache24 (Apache 2.4 HTTP server)
State: maintenance since Fri May 24 11:51:19 2019
Reason: Method failed.
See: http://support.oracle.com/msg/SMF-8000-8Q
See: http://httpd.apache.org
See: man -M /usr/apache2/2.4/man -s 8 httpd
See: /var/svc/log/network-http:apache24.log
Impact: This service is not running.

uhhh… got mail:

SUNW-MSG-ID: SMF-8000-YX, TYPE: Defect, VER: 1, SEVERITY: Major
EVENT-TIME: Fri May 24 11:51:19 CEST 2019
PLATFORM: ORCL,SPARC-T4-1, CSN: AKBLABLA42, HOSTNAME: sparc-server
SOURCE: software-diagnosis, REV: 0.2
EVENT-ID: e0114186-cd70-4085-84aa-802b091a399e
DESC: Service svc:/network/http:apache24 failed - a start, stop or refresh method failed.
AUTO-RESPONSE: The service has been placed into the maintenance state.
IMPACT: svc:/network/http:apache24 is unavailable.
REC-ACTION: Run 'svcs -xv svc:/network/http:apache24' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted. Please refer to the associated reference document at http://support.oracle.com/msg/SMF-8000-YX for the latest service procedures and policies regarding this diagnosis.

Nice… you could set several tags like problem-diagnosed, problem-updated, problem-repaired, problem-resolved, to- or from- (maintenance, from-online, to-degraded) or all for every transition.

And it would also work for specific services:

# svccfg -s application/myservice setnotify problem-diagnosed mailto:pressy@solaris.wtf

Easy, isn’t it?

Solaris – extra large page size support

Last time installing an Oracle DB on Solaris 11.4 SPARC I realized missing extra large memory page sizes. I want to see 16GB huge pages on SPARC but only got 2GB pages as largest allocated memory by Oracle. Well, still bigger than on x86, where you will get 4k/2M/1G but dynamic large pages chosen by the database like needed is a very nice feature on SPARC:

Multiple Page Size Support

MPSS feature in Oracle Solaris allows an application to use different page sizes for different regions of virtual memory. Larger page sizes let the Translation Lookaside Buffer (TLB) map more physical memory using the fixed number of TLB entries. Larger pages may therefore reduce the cost of virtual-to-physical memory mapping and increase overall system performance.

First of all we need a domain which provides 16GB pages which is controlled by the SPARC hypervisor (aka LDOM). This is documented by the LDOM parameter “effective-max-pagesize” and “hardware-max-pagesize”.

To get a given effective-max-pagesize of 16GB the LDOM must be assigned in a layout that includes at least one MBLOCK that has 4 aligned physical and contiguous ranges of 16GB. That means at least one MBLOCK with 4x16GB (64GB) and this MBLOCK *must* be aligned to a 16GB hardware address.

Alignment can be a tricky part since “LDOMs” reserve a small amout of memory for internal use which means the first available block might not be aligned to our needed 16GB hardware address. For example you could use hardware addresses for that like:

root@t7primary01:~# ldm set-mem mblock=0x400000000:64g my64Gdomain
root@t7primary01:~# ldm list-constraints my64domain | grep page
    effective-max-pagesize=16GB
    hardware-max-pagesize=16GB
root@t7primary01:~#

Pfua… ok… now it should work but still… no 16GB pages…

After struggling around with support I got an answer from a kernel developer… extra large pages are disabled on “small” systems with less than 512GB of memory (in pages). There were some issues (internal Bug) but with latest and greatest versions systems actually run satisfactorily.

Anyhow it is still disabled by this threshold (might chance again), but yes, it also might not really be necessary on smaller systems. If you still want to use extra large pages, you can adjust the threshold:

root@t7primary01:~# ldm list-constraints primary | grep -i 0x2
0x2000000000 256G
root@t7primary01:~# root@t7primary01:~# grep xlarge /etc/system
set xlarge_mem_threshold = 0x1900000
root@t7primary01:~# pagesize -a
8192
65536
4194304
268435456
2147483648
17179869184
root@t7primary01:~# pmap -sx $(ps -ef -o pid,comm | awk '/smon/ {print $1}') | grep osm

0000000380000000      8192      8192    -      8192   4M rwxsR--  [ osm shmid=0x0 ]
0000000380800000      4096      4096    -      4096    - rwxsR--  [ osm shmid=0x0 ]
00000003C0000000    262144    262144    -    262144 256M rwxsRi-  [ osm shmid=0x4 ]
0000000400000000 117440512 117440512    - 117440512  16G rwxsRi-  [ osm shmid=0x1 ]
0000002000000000   6291456   6291456    -   6291456   2G rwxsRi-  [ osm shmid=0x2 ]
0000002180000000   1835008   1835008    -   1835008 256M rwxsRi-  [ osm shmid=0x3 ]
root@t7primary01:~# 

Have fun showing your DBAs a possible platform performance feature…