zdqueue – script to find deleted files still using space on ZFS

Always a strange situation when ZFS shows you a full filesystem and you know that there should be enough free space. One reason could be a big file you did not think about. I wrote a small script to find the biggest file on a ZFS, you can find my zfsize-script in my previous post.
Another problem could be a deleted file which is still in access by a process. The file is gone and you do not see it in the filesystem with ls/du/find and so on… You will only get your free space when the process stops using the file or you kill the process.
I wrote a small script to find such processes and old files which are still in the ZFS delete queue.

root@solaris:~/scripts# ./zdqueue.sh -h
ZFS Delete Queue Analyzing
Usage:
                ./zdqueue.sh -z <ZFS> [-o tempdir]
root@solaris:~/scripts#
root@solaris:~/scripts# ./zdqueue.sh -z oracle/u01
ZFS = oracle/u01
Mountpoint = /u01
TempDir = /tmp
This may take a while ...
I will wait at least 1 minute before analyzing
............
  PID TTY         TIME CMD
 2703 pts/10      0:43 pfiles
Still analyzing process list...
Do you want to wait another minute or work with the data we have? (y/n) n
OK, I will kill process 2703 and work with gathered information
---------------------------------------
Process: 709    /u01/app/12.1.0.2/grid/bin/oraagent.bin
The file was:   /u01/app/grid/diag/crs/orasid/crs/trace/crsd_oraagent_oracle.trc

Process: 595    /u01/app/12.1.0.2/grid/bin/orarootagent.bin
The file was:   /u01/app/grid/diag/crs/orasid/crs/trace/crsd_orarootagent_root.trc





#!/usr/bin/bash
#set -x 
###################################################
#
# zdqueue v0.1
#
# ZFS Delete Queue Analyzing
#
# small script to find open files on ZFS which 
# should be deleted but are still using space.
#
# 16.09.2016, written by Martin Presslaber
#
###################################################
help ()
{
		print "ZFS Delete Queue Analyzing"
                print "Usage:"
                print "\t\t$0 -z <ZFS> [-o tempdir]"
}
########## preTESTS #############
OS=`uname -s`
RELEASE=`uname -r`
VERS=`uname -v`
ZONE=`zonename`
if [[ $OS != SunOS ]]
then
        print "This script will only work on Solaris"
        exit 1
fi
[[ $ZONE == global ]] || print "This script will only work in the global zone"
[[ $VERS == 1[1-9].[1-9] ]] && SOLARIS=new
if [ ${RELEASE#*.} -gt 10 ] ;
then
        ID=$(/usr/bin/whoami)
else
        ID=$(/usr/ucb/whoami)
fi
if [ $ID != "root" ]; then
        echo "$ID, you must be root to run this program."
        exit 1
fi
if [ $# -lt 1 ]
        then
                help && exit 1
        fi
########## Options ###########
TEMPDIR="/tmp"
while getopts "z:o:h" args
do
	case $args in
	z)
		ZFS=$OPTARG
		ZFSlist=`zfs list $ZFS 2>/dev/null | nawk -v ZFS=$ZFS '$1~ZFS {print $0}'`
		[[ $ZFSlist == "" ]] && print "$ZFS does not seem to be a ZFS" && exit 1
		ZFSmountpoint=`zfs list $ZFS 2>/dev/null | nawk -v ZFS=$ZFS '$1~ZFS {print $NF}'`
	;;

	o)
	TEMPDIR=$OPTARG
	[[ -d $TEMPDIR ]] || print "$TEMPDIR does not exist!" && exit 1
	;;

	h|*)
		help && exit 1
	;;
	esac
done
shift $(($OPTIND -1))
sleeping ()
{
SLEEP=1;  while [[ SLEEP -ne 12 ]]; do sleep 5 ; print ".\c" ; let SLEEP=$SLEEP+1; done ; print "."
}
######### Let's go #########
print "ZFS = $ZFS"
print "Mountpoint = $ZFSmountpoint"
print "TempDir = $TEMPDIR"
print "This may take a while ... "
print "I will wait at least 1 minute before analyzing"
######## Create File with open delete queue
zdb -dddd $ZFS $(zdb -dddd $ZFS 1 | nawk '/DELETE_QUEUE/ {print $NF}') > $TEMPDIR/zdqueue-open.tmp
######## Find processes with files from delete queue
OPENFILES=$(nawk '/\= / {print $NF}' $TEMPDIR/zdqueue-open.tmp | while read DQi; do echo "$DQi|\c"; done | nawk '{print $4 $NF}')

[[ $OPENFILES == "" ]] && print "No files in delete queue for $ZFS" && exit 0

pfiles `fuser -c $ZFSmountpoint 2>/dev/null` 2>/dev/null > $TEMPDIR/zdqueue-procs.tmp &
PIDpfiles=$!
sleeping 
ps -p $PIDpfiles && \
WAIT=yes
while [[ $WAIT == yes ]]
do 
	print "Still analyzing process list..."
	read -r -p "Do you want to wait another minute or work with the data we have? (y/n) " A
	case $A in
	[yY][eE][sS]|[yY])
	sleeping
	ps -p $PIDpfiles && \
	WAIT=yes
	;;
	[nN][oO]|[nN])
	print "OK, I will kill process $PIDpfiles and work with gathered information"
	kill $PIDpfiles
	WAIT=n
	;;	
	esac
done
print "---------------------------------------"
egrep $OPENFILES $TEMPDIR/zdqueue-procs.tmp | tr ':' ' ' | awk '$7 ~ /ino/ {print $8}' |\
while read INO
do 
	print "Process: \c"
	awk '/Current/ {print PROC};{PROC=$0} /ino/ {print $5}' $TEMPDIR/zdqueue-procs.tmp |\
	tr ':' ' ' | nawk -v INO=$INO '$1 ~ /^[0-9]/ {print $0} $2 ~ INO {print $0}' |\
	nawk '$1 ~ /ino/ {print INO};{INO=$0}'
	ZID=`nawk -v INO=$INO '$3 ~ INO {print $1}' $TEMPDIR/zdqueue-open.tmp`
	if [[ $SOLARIS == new ]]
	then
		print "The file was:   \c"
		echo "::walk zfs_znode_cache | ::if znode_t z_id = $ZID and z_unlinked = 1 | ::print znode_t z_vnode->v_path" |\
		mdb -k | awk '/\// {print $NF}' | sed 's/\"//g'
	else
		print "The file was:   \c"
		echo "::walk zfs_znode_cache z | ::print znode_t z_id | ::grep ".==$ZID" | ::map <z | ::print znode_t z_vnode->v_path z_unlinked" |\
		mdb -k | awk '/\// {print $NF}' | sed 's/\"//g'
	fi
	print "\n"
done

#### Clean up ####
rm /$TEMPDIR/zdqueue-procs.tmp 
rm /$TEMPDIR/zdqueue-open.tmp
#################### EOF ####################

zfsize – small script to find the biggest files on ZFS

I found some time to script and was looking into zdb and what could be done with it. I would say that it is a nice feature to ask “what is your biggest file in that filesystem” (rather a mega find command). You could also find files which where deleted but are still in the ZFS because a process is using it. I also wrote a script for that, you will find it in my next post.

root@server:~# ./zfsize.sh -h
small script to find the biggest files on ZFS
Usage:
                ./zfsize.sh -z <ZFS> [-o tempdir] [-c count]
root@server:~# ./zfsize.sh -z rpool/downloads -c 2
ZFS = rpool/downloads
Mountpoint = /downloads
TempDir = /tmp
This may take a while ...
/downloads/sol-10-u11-ga-sparc-dvd.iso  2207.50 MB
/downloads/sol-11_1-repo-full.iso       2896.00 MB
root@server:~#

#!/usr/bin/bash
#set -x
###################################################
#
# zfsize v0.1
#
# ZFS file sizes
#
# small script to find the biggest files on ZFS
#
# 16.09.2016, written by Martin Presslaber
#
###################################################
help ()
{
		print "small script to find the biggest files on ZFS"
                print "Usage:"
                print "\t\t$0 -z <ZFS> [-o tempdir] [-c count]"
}
########## preTESTS #############
OS=`uname -s`
RELEASE=`uname -r`
VERS=`uname -v`
ZONE=`zonename`
if [[ $OS != SunOS ]]
then
        print "This script will only work on Solaris"
        exit 1
fi
[[ $ZONE == global ]] || print "This script will only work in the global zone"
[[ $VERS == 1[1-9].[1-9] ]] && SOLARIS=new
if [ ${RELEASE#*.} -gt 10 ] ;
then
        ID=$(/usr/bin/whoami)
else
        ID=$(/usr/ucb/whoami)
fi
if [ $ID != "root" ]; then
        echo "$ID, you must be root to run this program."
        exit 1
fi
if [ $# -lt 1 ]
        then
                help && exit 1
        fi
#[[ $1 != "-[az]" ]] && help && exit 1
########## Options ###########
TEMPDIR="/tmp"
while getopts "z:o:c:h" args
do
        case $args in
        z)
                ZFS=$OPTARG
                ZFSlist=`zfs list $ZFS 2>/dev/null | nawk -v ZFS=$ZFS '$1~ZFS {print $0}'`
                [[ $ZFSlist == "" ]] && print "$ZFS does not seem to be a ZFS" && exit 1
                ZFSmountpoint=`zfs list $ZFS 2>/dev/null | nawk -v ZFS=$ZFS '$1~ZFS {print $NF}'`
        ;;

        o)
        TEMPDIR=$OPTARG
        [[ -d $TEMPDIR ]] || print "$TEMPDIR does not exist!" && exit 1
        ;;

	c)
	COUNT="-$OPTARG"
	;;

        h|*)
                help && exit 1
        ;;
        esac
done
shift $(($OPTIND -1))

######### Let's go #########
print "ZFS = $ZFS"
print "Mountpoint = $ZFSmountpoint"
print "TempDir = $TEMPDIR"
print "This may take a while ... "

zdb -dddd $ZFS |\
nawk -v MP=$ZFSmountpoint 'BEGIN { printf("FILE\tSIZE\n"); }
$0 ~/ZFS plain file$/ { interested = 1; }
interested && $1 == "path" { printf(MP"%s", $2); }
interested && $1 == "size" { printf("\t%.2f MB\n", $2/1024/1024); }
interested && $1 == "Object" { interested = 0; }'  > $TEMPDIR/zfsize.tmp
sort -nk 2,2 $TEMPDIR/zfsize.tmp > $TEMPDIR/zfsize-sorted.tmp
tail $COUNT $TEMPDIR/zfsize-sorted.tmp
# clean up
rm $TEMPDIR/zfsize.tmp
rm $TEMPDIR/zfsize-sorted.tmp
##################### EOF #####################

Using different HW Features in a Box

I wrote a small article for my company how you could use Oracle’s new SPARC hardware for different layers in your datacentre… original in German, could be found on SPARC T7-1 testing In-Memory, DAX and Crypto Engines
Some findings and interesting points translated for my blog:

So what I thought about are classic tasks normally found on several servers, build in one box. All of them could benefit from different features which come with M7 or S7 chips.
The database in the backend will profit from the big memory bandwidth and the SQL Offload Engines called DAX, data analytics accelerators. In the combination Oracle says in their PowerPoints the database could scan up to 170 billion rows per second with those streaming engines with a measured bandwidth from 160GB/sec per socket. Wow… and that’s measurement; the M7 processor hardware facts are talking about 4 memory controller units per socket which could handle 333 GB/sec raw memory bandwidth per processor. (It seems that DDR4 is the “bottleneck” not the CPU…) compared to the latest Xeon E7 88xx v4 (Q2/16) with 102GB/sec mentioned on Intel’s ARK technical details pages.

The next layer could be the application itself. With 8 threads per core a perfect fit for a high user load and with critical threads the process has more exclusive access to the hardware. Perfect for running a wide mix of workloads, some will be designed for throughput, others for low latency.

The third level could be something like a reverse proxy with a SSO backend or something. The proxy could take the application sessions if not already encrypted and use the build in cryptographic accelerators on the processor to encrypt. Solaris itself and some standard applications using these engines already, like Apache, IPsec, Java, KSSL, OpenSSL, ZFS Crypto. But not only Oracle software like the database and WebLogic are supporting Solaris’ Crypto Framework, also IBM with DB2, Informix, IBM HTTP Server or WebSphere are certified with IBM Global Security Kit to use SPARC’s hardware encryption (IBM GSKit v8).

Oracle SPARC processors can handle 15 industry standard algorithms and a bunch of random number generators (AES, Camellia, CRC32c, DES, 3DES, DH, DSA, ECC, MD5, RSA, SHA-1, SHA-224, SHA-256, SHA-384, SHA-512). (BTW; Xeons would have 7 crypto instructions and 5 on-chip accelerated algorithms; IBM Power8 = 6 instructions and 8 accelerated.)

The last level could be the way to the internet, separated to the other domains. Solaris offers a build-in firewall, load-balancer or other web utilities to handle the connections. Having Solaris on SPARC in the front helps you easily to prevent so called script-kiddies using their found hacks and attacks because on one side SPARC is big endian based, so standard attacks will run into the “wrong direction” compared to little endian on x86. On the other side the new SPARC processors are protected by “silicon-secure-memory”. When an application requests some new memory to use via malloc(), the operating system tags the block of memory with a version number, and gives the app a pointer to that memory. Whenever a pointer is used to access a block of memory, the pointer’s version number must match the memory block’s version number, or an exception will be triggered. The version numbers are checked in real-time by the processor with a tiny overhead – an extra one percent of execution time, according to Oracle’s benchmarks. (more infos at theregister )
So imaging using all of these features a whole datacentre could be hosted on a single server or if it comes down to availability you could build a cluster with failover or live migration between the servers.

t7datacenter