Friday, December 14, 2012

Migrating from Solaris/Sparc to Solaris/x86 or RHEL? Use ZFS!

Problem
At a large company,
the DBA teams migrating Oracle databases from Solaris/Sparc to RHEL5/x86 had a major pain point:
- dumping/exporting the database was easy, converting its endianness was simple and re-importing it on the Linux side was easy too.
-HOWEVER- transferring the dump from one system to another was 1) slow and 2) unpredictable (large VLAN's make for increased latency and slower transfers).

If your DB was pretty small, you would just wait for a few hours but with a database of a few Tb, it just wasn't practical.

Solution: Use a shared and compressed ZFS pool on a SAN box and enjoy transfer-less database dumps.

Solaris/Sparc and Solaris/x64 speak ZFS natively but Linux doesn't, unless you use zfs-fuse (http://zfs-fuse.net). ZFS-fuse installs without reboot and makes your zpool available instantly.



Here's the howto:
1) Install zfs-fuse from http://vscojot.free.fr/dist/zfs-fuse on your Linux box (here, an HP DL580G7):

vcojot@rhel5x64$ sudo yum install -y xz-libs
vcojot@rhel5x64$ sudo rpm -ivh  liblzo2_2-2.03-6.el5.x86_64.rpm  zfs-fuse-0.7.0p1-11.el5.x86_64.rpm


2) Create your Zpool on the Sparc ( version 26 is the latest supported by zfs-fuse).

vcojot@solsparc$ sudo zpool create -o version=26 vsc_pool \
c4t60000970000292602571533030333032d0 c4t60000970000292602571533030333030d0 \
[....]

vcojot@solsparc$ uname -a
SunOS solsparc 5.10 Generic_147440-07 sun4u sparc SUNW,SPARC-Enterprise
vcojot@solsparc$ sudo zpool list adbtmpdbdump
NAME           SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
adbtmpdbdump   540G   190K   540G     0%  ONLINE  -


vcojot@solsparc$ sudo zpool status adbtmpdbdump
  pool: adbtmpdbdump
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scan: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        adbtmpdbdump                             ONLINE       0     0     0
          c4t60060160A9312C000A487C3E6132E211d0  ONLINE       0     0     0
          c4t60060160A9312C008A157877D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C00969C5951D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C009C9C5951D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C00A09C5951D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C000C487C3E6132E211d0  ONLINE       0     0     0
          c4t60060160A9312C009A9C5951D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C004EE187C5711AE111d0  ONLINE       0     0     0
          c4t60060160A9312C0008487C3E6132E211d0  ONLINE       0     0     0
          c4t60060160A9312C0012487C3E6132E211d0  ONLINE       0     0     0
          c4t60060160A9312C000E487C3E6132E211d0  ONLINE       0     0     0
          c4t60060160A9312C00989C5951D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C00A29C5951D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C009E9C5951D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C00A49C5951D6EAE111d0  ONLINE       0     0     0
          c4t60060160A9312C0010487C3E6132E211d0  ONLINE       0     0     0

errors: No known data errors


3) Re-discover your SAN configuration on your other hosts and check that the pool can be imported.
Here's the pool as seen from the solx64 machine:

vcojot@solx64$ sudo zpool import
  pool: adbtmpdbdump
    id: 16626771482833154241
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        adbtmpdbdump                             ONLINE
          c4t60060160A9312C000A487C3E6132E211d0  ONLINE
          c4t60060160A9312C008A157877D6EAE111d0  ONLINE
          c4t60060160A9312C00969C5951D6EAE111d0  ONLINE
          c4t60060160A9312C009C9C5951D6EAE111d0  ONLINE
          c4t60060160A9312C00A09C5951D6EAE111d0  ONLINE
          c4t60060160A9312C000C487C3E6132E211d0  ONLINE
          c4t60060160A9312C009A9C5951D6EAE111d0  ONLINE
          c4t60060160A9312C004EE187C5711AE111d0  ONLINE
          c4t60060160A9312C0008487C3E6132E211d0  ONLINE
          c4t60060160A9312C0012487C3E6132E211d0  ONLINE
          c4t60060160A9312C000E487C3E6132E211d0  ONLINE
          c4t60060160A9312C00989C5951D6EAE111d0  ONLINE
          c4t60060160A9312C00A29C5951D6EAE111d0  ONLINE
          c4t60060160A9312C009E9C5951D6EAE111d0  ONLINE
          c4t60060160A9312C00A49C5951D6EAE111d0  ONLINE
          c4t60060160A9312C0010487C3E6132E211d0  ONLINE


Here's the pool as seen from the RHEL5 x64 machine:

 vcojot@rhel5x64$ sudo zpool import
  pool: adbtmpdbdump
    id: 16626771482833154241
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        adbtmpdbdump                                                                    ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016846e0311e:0x000f000000000000-part1  ONLINE
          disk/by-path/pci-0000:0e:00.0-fc-0x5006016946e0311e:0x001b000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016846e0311e:0x0013000000000000-part1  ONLINE
          disk/by-path/pci-0000:0e:00.0-fc-0x5006016146e0311e:0x0016000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016846e0311e:0x0018000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016846e0311e:0x0010000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016046e0311e:0x0015000000000000-part1  ONLINE
          disk/by-path/pci-0000:0e:00.0-fc-0x5006016146e0311e:0x000d000000000000-part1  ONLINE
          disk/by-path/pci-0000:0e:00.0-fc-0x5006016146e0311e:0x000e000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016846e0311e:0x001c000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016046e0311e:0x0011000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016046e0311e:0x0014000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016846e0311e:0x0019000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016046e0311e:0x0017000000000000-part1  ONLINE
          disk/by-path/pci-0000:0e:00.0-fc-0x5006016146e0311e:0x001a000000000000-part1  ONLINE
          disk/by-path/pci-0000:81:00.0-fc-0x5006016846e0311e:0x0012000000000000-part1  ONLINE


4) You're almost done! Dump your database to your zpool.

5) Export the zpool on solsparc and re-import it on solx64 or rhel5x64 (only takes a few minutes).

6) Re-import your database

Also, database files do usually compress pretty well so here's a 3.5Tb database on another shared zpool:

vcojot@solx64$ sudo zpool list ADBZFSDUMP
NAME         SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
ADBZFSDUMP  1.99T  1.55T   449G    77%  ONLINE  -


vcojot@solx64$ sudo zfs get compressratio ADBZFSDUMP
NAME        PROPERTY       VALUE  SOURCE
ADBZFSDUMP  compressratio  2.07x  -


Even though zfs-fuse is userspace-only, we're seeing decent performance using it. Here, on a Xeon system, it's reading compressed data into memory:


vcojot@rhel5x64$ sudo zpool iostat 1
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
adbtmpzfs    15.2G   223G     11     67   635K   994K
adbtmpzfs    15.2G   223G      0      0      0      0
adbtmpzfs    15.2G   223G      0      0      0      0
adbtmpzfs    15.2G   223G      0      0      0      0
adbtmpzfs    15.2G   223G      0      0      0      0
adbtmpzfs    15.2G   223G    849      0   106M      0
adbtmpzfs    15.2G   223G    878      0   109M      0
adbtmpzfs    15.2G   223G    990      0   123M      0
adbtmpzfs    15.2G   223G    911      0   114M      0
adbtmpzfs    15.2G   223G    898      0   112M      0
adbtmpzfs    15.2G   223G    967      0   120M      0
adbtmpzfs    15.2G   223G    929      0   116M      0
adbtmpzfs    15.2G   223G  1.03K      0   132M      0
adbtmpzfs    15.2G   223G  1.10K      0   141M      0
adbtmpzfs    15.2G   223G   1014      0   126M      0
adbtmpzfs    15.2G   223G    849      0   106M      0
adbtmpzfs    15.2G   223G    821      0   102M      0
adbtmpzfs    15.2G   223G    914      0   114M      0
adbtmpzfs    15.2G   223G    967      0   120M      0
adbtmpzfs    15.2G   223G  1.03K      0   131M      0
adbtmpzfs    15.2G   223G    862      0   107M      0
[...]



The only requirement is that all of your hosts must share a common SAN fabric (Which usually means metro-localization).


Monday, August 13, 2012

We must preserve the past for the criticism of the future

A long time ago,
when 64mb were a huge amount of RAM, there used to be that GUI environment named 'OpenWindows', evolved from SunView and from some research at Xerox.
 In its time (early 90's), it was liked by many (SUN people, mostly, I guess) and despised by others because it was deemed 'too heavy' when compared to other window managers (fvwm, twm, etc...).Read more about it here: http://en.wikipedia.org/wiki/Openwindows.

Why is all this relevant today?

In late 2011, some small parts of the full OpenWindows DeskSet were ported to Linux (yes, ported). This included the libdeskset library, and one of its most popular apps: The File Manager.

It's still relevant today because, in these 2012 times of metacity and RHEL6 desktops, the OWacomp desktop still feels 'snappier' than many other 'bundled' desktop environments. I agree that GNOME or KDE have come a long way but whatever the 'theme', they just don't feel as 'snappy'..

Here are the screenshots:
The real thing (tm):


  
The Linux (RHEL5/RHEL6) port:


  Looks close enough, don't you think?


Wednesday, March 28, 2012

VxFS 6.0 and compression (a real must have).

VxFS 6.0 and compression (a real must have).


Here's a little personal info: I like good movies and good anime (for the kids). I like them even better when stored as DVD5 or DVD9 iso images for playing them with VLC (VideoLan Client) on the wide-screen TV.
But as my list of DVD's grew, my VxFS 3Tb filesystem started to become full (about 400 DVDs). With VxFS version 9 compression feature, I gained some time and was able to reclaim some space..

First of all:

1) Upgrade to SF6.0
2) Upgrade your DG (vxdg upgrade <dg>)
3) Upgrade your filesystem (vxupgrade -n 9 <fs>)

Which gives:

[root@thorbardin ~]# vxdg  upgrade local00dg
[root@thorbardin ~]# vxupgrade -n 9 /export/home/raistlin/.private/movies0
UX:vxfs vxupgrade: ERROR: V-3-25236: /dev/vx/rdsk/local00dg/movies0_lv: already at disk layout version 9.

[root@thorbardin ~]# find /export/home/raistlin/.private/movies0/DVD -name \*.iso|wc -l
512

So we have 512 DVD images on this FS.

Let's start the compression process:
[root@thorbardin ~]# vxcompress -v -r /export/home/raistlin/.private/movies0/DVD

You may check the compression progress with:
[root@thorbardin ~]# /opt/VRTS/bin/fsadm -S compressed /export/home/raistlin/.private/movies0
 Mountpoint    Size(KB)    Available(KB)   Used(KB)   Logical_Size(KB) Space_Saved(KB)
/export/home/raistlin/.private/movies0  2929236384     146315977 2773199135       2928058770     154859635

You may also check the compression ratio for a file or a set of files with:
[root@thorbardin ~]# vxcompress -L /export/home/raistlin/.private/movies0/DVD/*/*.iso
[....]
   4%    6.42 GB    6.68 GB   100%   gzip-6  1024k  /export/home/raistlin/.private/movies0/DVD/Dessins_Animes/Barbapapa.iso
   7%     5.9 GB    6.33 GB   100%   gzip-6  1024k  /export/home/raistlin/.private/movies0/DVD/Dessins_Animes/Albator_78_-_D6.iso
   4%    7.59 GB    7.92 GB   100%   gzip-6  1024k  /export/home/raistlin/.private/movies0/DVD/Dessins_Animes/Albator_78_-_D2.iso
   4%    7.56 GB    7.88 GB   100%   gzip-6  1024k  /export/home/raistlin/.private/movies0/DVD/Dessins_Animes/Albator_78_-_D1.iso

Total:    512 files       147.7 GB (5%) storage savings

So here it is: compressing 'in-place' an existing VxFS filesystem saved me about 148Gb. Time to save more of my DVD's! Considering that DVD's are already compressed (MPEG2 streams), 5% gained over a 3Tb filesystem isn't that bad.

If you have static data or keep sparse files around (database datafiles come to mind), we've seen compression ratios of as much as 7-10x!

The nice thing is that this compression stuff is completely transparent; might even get you higher performance in some case (de-compression is done in-memory) and you can upgrade to compressed files in-place, without having to do the backup/restore dance. It's also revertable: just uncompress your files if you wish to go back.

My 2c, VxFS: good for home and enterprise use! :)

Friday, March 9, 2012

VxFS 6.0 comes with de-dupe!

For those of you who don't want to watch Symantec's video, here's a quick & cheap howto of a VxFS de-dupe test using SFHA 6.0 under RHEL5.8:

* First, here's our setup:
[root@vcs11 ~]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/rootdg-lv_root
                      15109112   9258676   5070560  65% /
/dev/sda1               101086     20027     75840  21% /boot
tmpfs                  1029372         0   1029372   0% /dev/shm
tmpfs                        4         0         4   0% /dev/vx
/dev/vx/dsk/vcsdg/lv_vcs
                        131072      4473    118813   4% /shared/vcs



 * Let's check the initial state of this setup:
[root@vcs11 ~]# /opt/VRTS/bin/fsadm -S shared /shared/vcs
 Mountpoint    Size(KB)    Available(KB)   Used(KB)   Logical_Size(KB) Space_Saved(KB)
/shared/vcs      131072        118813       4473             4473             0


* Enable de-dupe for our test filesystem:
[root@vcs11 ~]# /opt/VRTS/bin/fsdedupadm enable /shared/vcs -c 4096

* Copy some data:
[root@vcs11 ~]# cp -a /etc/lvm /shared/vcs/lvm1

* Check the results (nothing happened):
[root@vcs11 ~]# /opt/VRTS/bin/fsadm -S shared /shared/vcs
 Mountpoint    Size(KB)    Available(KB)   Used(KB)   Logical_Size(KB) Space_Saved(KB)
/shared/vcs      131072        118724       4562             4562             0


* Copy some data again and again.
[root@vcs11 ~]# cp -a /etc/lvm /shared/vcs/lvm2

[root@vcs11 ~]# cp -a /etc/lvm /shared/vcs/lvm3

[root@vcs11 ~]# cp -a /etc/lvm /shared/vcs/lvm4


* Still no change!
[root@vcs11 ~]# /opt/VRTS/bin/fsadm -S shared /shared/vcs
 Mountpoint    Size(KB)    Available(KB)   Used(KB)   Logical_Size(KB) Space_Saved(KB)
/shared/vcs      131072        118457       4829             4829             0




* Enable De-dupe for our node.
[root@vcs11 ~]# /opt/VRTS/bin/fsdedupadm list /shared/vcs
Chunksize Enabled Schedule        NodeList        Filesystem
--------------------------------------------------------------------------------
4096      YES     NONE            vcs10.lasthome. /shared/vcs
[root@vcs11 ~]# /opt/VRTS/bin/fsdedupadm setnodelist vcs11.lasthome.solace.krynn /shared/vcs


* Start the de-dupe process.
[root@vcs11 ~]# /opt/VRTS/bin/fsdedupadm start /shared/vcs
UX:vxfs fsdedupadm: INFO: V-3-27767:  deduplication is started on /shared/vcs.


* Check the results
[root@vcs11 ~]# /opt/VRTS/bin/fsdedupadm status /shared/vcs
Saving    Status    Node            Type        Filesystem
--------------------------------------------------------------------------------
05%       COMPLETED vcs11           MANUAL      /shared/vcs
        2012/01/16 17:11:25 Begin full scan
        2012/01/16 17:11:29 End detecting duplicates and filesystem changes.

[root@vcs11 ~]# /opt/VRTS/bin/fsadm -S shared /shared/vcs
 Mountpoint    Size(KB)    Available(KB)   Used(KB)   Logical_Size(KB) Space_Saved(KB)
/shared/vcs      131072        118363       4906             5167           261


LVM2 bootdisk encapsulation on RHEL7/Centos7

Introduction Hi everyone, Life on overcloud nodes was simple back then and everybody loved that single 'root' partition on th...