LSI MegaRaid HBA's, overheating and one ugly hack

Summer is here!
Like many others, I have LSI MegaRaid HBA's that I use in desktop machines.

These things are great but they tend to overheat and quite a few people have reported high temperature  findings (97C reported by the chip when idle'ing in both my Dell T410 and my Dell T5610):


sudo megaclisas-status

-- Controller information --
-- ID | H/W Model                  | RAM    | Temp | Firmware     
c0    | LSI MegaRAID SAS 9271-8i   | 1024MB | 97C  | FW: 23.32.0-0009 
c1    | LSI MegaRAID SAS 9280-4i4e | 512MB  | N/A  | FW: 12.15.0-0205 
[...]

I had never really bothered about the temperature but when I started to rebuild my T410's boot LD (Raid-1) to swap the 2Tb drives with 4Tb drives I had, things started to get complicated quickly.

As soon as the mirror started rebuilding, the ROC temp (sitting around 97C) skyrocketted to 102C and soon enough the card shut down itself, dropping off the PCI-E bus and resetting the server.

After machine reset, the mirroring process continued where it had left off, the temperature of the ROC increased and the whole system reset itself again.

Luckily, rebuilding a Logical Drive (LD) is a reliable process and it can recover successfully after a system reset.
I finished rebuilding the boot LD mirror with the computer case open so that ambient air would cool the card sufficiently to let the rebuild complete.

After things were back to normal, I started researching the issue and found others with similar problems.

Someone also attempted to fit a 40mm fan on top of the heatsink and described the specs with great detail:

There was even someone who had a business on e-bay selling overpriced fans for MegaRaid controllers (Talk about a known problem!!!):


So it seemed like a known and unacknowledged problem and I set out to find a solution. Alas I couldn't fit a fan to my 9271-8i's heatsink because I had no space left on top of it (all PCI-E slots were used in that machine).

After some experimentation with a spare 40mm fan I had, I came up with the following workaround:

1) find a small reliable 40mm fan to fit to the side of the overheating MegaRaid card.
 I went for the Noctua NF-A4x10 FLX fan (150,000 MTBF and no more than 20dBA)

2) Attach the fan to the heatsink of the 9271-8i card. Luckily, the heatsink of the other LSI card (a 9280-4i4e) was close enough to let me position the fan across both cards so that it would cool both heatsinks.

Here's a picture of the inside of the Dell T410 with the fan fitted on top of both cards.

With the fan attached, the cards runs much cooler (even in the hot summer weather):
sudo megaclisas-status
-- Controller information --
-- ID | H/W Model                  | RAM    | Temp | Firmware     
c0    | LSI MegaRAID SAS 9271-8i   | 1024MB | 71C  | FW: 23.32.0-0009 
c1    | LSI MegaRAID SAS 9280-4i4e | 512MB  | N/A  | FW: 12.15.0-0205 
[...]

So with a $8.99 fan, I experienced a drop of 28C in chip temperature. This is one crude and ugly hack (the fan was carefully attached/screwed to the heatsinks) but it does the job.

For the T5610, since I had more room in the PCI-E slots, I simply screwed the fan to the LSI's heatsink:

Again, this resulted in a significant temperature drop (down from 102C):

sudo megaclisas-status
-- Controller information --
-- ID | H/W Model                  | RAM    | Temp | Firmware     
c0    | LSI MegaRAID SAS 9271-8i   | 1024MB | 56C  | FW: 23.32.0-0009
[...]

Update (2016/07/22), I've removed the screws in the T410 and setteled for something a little cleaner (a magnetic arm holding the fan) so that it cools both HBAs in an appropriate way.




Comments

  1. The chassis temp of the T410 was around 26C at that time (room temp).
    It isn't exactly over specs.. :(

    ReplyDelete
  2. Wich kind of screws did you used? 9271-8i ?

    ReplyDelete
  3. Could you share where to acquire the magnetic arm? Thanks.

    ReplyDelete
    Replies
    1. The magnetic arms I used were those:
      http://www.akust.com/product/adjustable-magnetic-fan-bridge-mounting-kit
      No issues to report so far..
      Regards,
      Vincent

      Delete
  4. I was adding an LSI 9260-8i to my HP 420 Workstaion server. The HP420 has a green-click used to hold the default-issue video card inplace. I used the clip to attach a fan over the video and LSI card as a cooler. I have pictures if you would like to add to this post. contact me and I will send you the pix

    ReplyDelete
  5. Vincent since you have - obviously a 9361-8i could you do me a really big huge favor and measure the size of the existing heatsync as I am looking at doing a custom made job for one of these and running long dissipation lines down to the PCI slot cover but as I don't have my card yet I can't start work on the idea.

    ReplyDelete
  6. I am reading this blog with interest as maybe you experts can help us with a problem. We have a customer using a Dell server with MegaRAID® SAS 9271-8i 6Gb/s SAS and SATA RAID Controller Card.in a video strage application. For the third time we have had a disc failure in the same slot and occasional crashes of the computer with messages saying video storage is lower than it should be. The company who sold us the machine and computer has only changed the drive, but I am thinking that the raid controller board, the cables or maybe now overheating is the problem. Before I buy a new raid controller board SAS drive and cables, Have any of you have seen similar problems and could point to the probable cause.The system is in Portugal and quite hot in a snall room. I suppose there is a log of core temperature prior to failure of the disc. sorry I am not a computer engineer.
    Thanks
    Chris Price
    info@cinesonics.pt

    ReplyDelete
    Replies
    1. Hi Chris,
      I wouldn't be able to know where your problem comes from.
      I'd advise looking at the HBA's log to see if there's anything relevant there. Check the iDrac's log too.
      It could be several things: the backplane slot, the cables to/from the HBA.
      One thing I know for sure is that the 9271-8i will most likely overheat in a tower server during the summer without additional cooling.
      For my Dell PE T410 server, it resulted in the HBA shutting itself down and dropping off the PCI-E bus during a RAID rebuild.
      At the very least you should make sure the fans are functionning properly and that the small heatsink on the 9271-8i gets enough airflow.
      There's a temperature sensor on the 9271-8i, how hot is it? And check the drives' temperatures too. Some of them in the cage could
      be running hotter than others, especially if this is a tower server.
      The PERC cards usually have a larger heatsink and don't require that much airflow.
      Good luck,

      Delete
  7. Hi Vincent, thanks for you advice, I suspect it is overheating although the room is at 18 deg C. I will visit the customer next week and open the computer which i did not do now. The computer was provided as an OEM for by the manufacturers their film scanning machine.
    We have lost two drives and the computer crashes unexpectedly, this last time losing the config of the raid completely. I have seen battery backup modules for this raid. do you think it is worth having this
    thanks again for your help.

    ReplyDelete
  8. Chris -

    I know this is a very late response, but I just saw your post. I had all kinds of strange trouble when I first put in my 9271-8i until I figured out my power supply was failing. New power supply and everything is great. It won't tolerate flakiness in power.

    David

    ReplyDelete
  9. I had a custom heat sink done it's longer than the bog standard one that comes with the 9361-8i and I had the mounting holes milled locally, I tested the original and got 48 degrees c with the side off the case and a Noctua CPU fan on it's lowest setting below it by three to four inches then I replaced it with my custom sink and knocked 14 degrees off the temp on the same setup. I get between 34 and 38 degrees C. The sink is not small enough at the moment to be able to use the PCIe slot beneath the card as it is 80Lx30Wx27H and I am looking at trying an 80x30x18 crosscut to see if I can achieve the same sort of results. The nice thing is I can now touch the heat sink and not get burnt. I will also be testing it with the smaller noctua 40mm fans to see if they are sufficient to cool it. If these work out I will be making them available for purchase, the current designs are only for standard desktop cases NOT for mATX or server equipment, I will be working on those next. If anyone is interested in the current ones let me know and I'll see what I can sort out at the moment my milling runs are only small so they are a bit pricey, the more of them I make in a run the cheaper it gets. Oh and you can reuse the existing mounting pins.

    ReplyDelete
  10. Should have mentioned to the bases on these sinks are about 5mm thick so they really pull the heat out.

    ReplyDelete

Post a Comment

Popular posts from this blog

Some Tips about running a Dell PowerEdge Tower Server as your workstation

VMWare Worksation 12 on Fedora Core 23/24 (fc23 and fc24)