Sunday, July 12, 2015

LSI MegaRaid HBA's, overheating and one ugly hack

Summer is here!
Like many others, I have LSI MegaRaid HBA's that I use in desktop machines.

These things are great but they tend to overheat and quite a few people have reported high temperature  findings (97C reported by the chip when idle'ing in both my Dell T410 and my Dell T5610):


sudo megaclisas-status

-- Controller information --
-- ID | H/W Model                  | RAM    | Temp | Firmware     
c0    | LSI MegaRAID SAS 9271-8i   | 1024MB | 97C  | FW: 23.32.0-0009 
c1    | LSI MegaRAID SAS 9280-4i4e | 512MB  | N/A  | FW: 12.15.0-0205 
[...]

I had never really bothered about the temperature but when I started to rebuild my T410's boot LD (Raid-1) to swap the 2Tb drives with 4Tb drives I had, things started to get complicated quickly.

As soon as the mirror started rebuilding, the ROC temp (sitting around 97C) skyrocketted to 102C and soon enough the card shut down itself, dropping off the PCI-E bus and resetting the server.

After machine reset, the mirroring process continued where it had left off, the temperature of the ROC increased and the whole system reset itself again.

Luckily, rebuilding a Logical Drive (LD) is a reliable process and it can recover successfully after a system reset.
I finished rebuilding the boot LD mirror with the computer case open so that ambient air would cool the card sufficiently to let the rebuild complete.

After things were back to normal, I started researching the issue and found others with similar problems.

Someone also attempted to fit a 40mm fan on top of the heatsink and described the specs with great detail:

There was even someone who had a business on e-bay selling overpriced fans for MegaRaid controllers (Talk about a known problem!!!):


So it seemed like a known and unacknowledged problem and I set out to find a solution. Alas I couldn't fit a fan to my 9271-8i's heatsink because I had no space left on top of it (all PCI-E slots were used in that machine).

After some experimentation with a spare 40mm fan I had, I came up with the following workaround:

1) find a small reliable 40mm fan to fit to the side of the overheating MegaRaid card.
 I went for the Noctua NF-A4x10 FLX fan (150,000 MTBF and no more than 20dBA)

2) Attach the fan to the heatsink of the 9271-8i card. Luckily, the heatsink of the other LSI card (a 9280-4i4e) was close enough to let me position the fan across both cards so that it would cool both heatsinks.

Here's a picture of the inside of the Dell T410 with the fan fitted on top of both cards.

With the fan attached, the cards runs much cooler (even in the hot summer weather):
sudo megaclisas-status
-- Controller information --
-- ID | H/W Model                  | RAM    | Temp | Firmware     
c0    | LSI MegaRAID SAS 9271-8i   | 1024MB | 71C  | FW: 23.32.0-0009 
c1    | LSI MegaRAID SAS 9280-4i4e | 512MB  | N/A  | FW: 12.15.0-0205 
[...]

So with a $8.99 fan, I experienced a drop of 28C in chip temperature. This is one crude and ugly hack (the fan was carefully attached/screwed to the heatsinks) but it does the job.

For the T5610, since I had more room in the PCI-E slots, I simply screwed the fan to the LSI's heatsink:

Again, this resulted in a significant temperature drop (down from 102C):

sudo megaclisas-status
-- Controller information --
-- ID | H/W Model                  | RAM    | Temp | Firmware     
c0    | LSI MegaRAID SAS 9271-8i   | 1024MB | 56C  | FW: 23.32.0-0009
[...]

Update (2016/07/22), I've removed the screws in the T410 and setteled for something a little cleaner (a magnetic arm holding the fan) so that it cools both HBAs in an appropriate way.




LVM2 bootdisk encapsulation on RHEL7/Centos7

Introduction Hi everyone, Life on overcloud nodes was simple back then and everybody loved that single 'root' partition on th...