Many RAID controllers, like our Dell PERC cards, go through a battery learning cycle which calibrates the capacity of the battery to ensure it does not unexpectedly fail. For us, this cycle occurs every 90 days. When a battery learning cycle begins, it fully charges, discharges, and then charges again, realigning the true capacity of the battery. While it performs the learning process, you cannot rely on it to sustain the cache in the event of a power failure.
To prevent any form of data loss, the RAID controller automatically switches its write policy from write-back to write-through, where the writes bypass the cache and hit the disk directly. This worsens the performance because the drive has to service each request individually, usually with random writes, rather than performing optimized or grouped writes to the disk.
Learn cycles are done periodically to fully discharge the battery and re-charge it. When complete, the BBU determines the new capacity of charge the battery can hold. Failure to run learn cycles at their recommended intervals may reduce the usable life of the battery by reducing the full charge capacity more rapidly leading to premature end of service life. This is reported by the “Full Charge Capacity” field in MegaCLI BBU output and will be updated after a learn cycle. Refer to the next section for an example.
When a learn cycle is initiated, the charging circuit automatically places any virtual drives that are in WB mode into WT mode for the duration of the cycle which will temporarily reduce write performance. Once the learn cycle completes, the virtual drives are automatically transitioned back to WB mode if the battery is still capable of holding the required charge amount. Learn cycle time will vary dependent on the BBU type.
For BBU07 the complete learn cycle process and the cache in WT mode is expected to be 6 to 8 hours.
For BBU08 the complete learn cycle process and the cache in WT mode is expected to be 2 to 3 hours.
Note, when a new BBU is installed into a system, it will have a depleted charge state. Any virtual drives attached will be forced into WT cache mode while a full learn cycle is performed. Usually a sufficient charge to maintain the cache is reached after this cycle is complete. This may take 24 hours or longer.
To determine the Battery Type, run the following:
# ./MegaCli64 -AdpBbuCmd -a0 | grep BatteryType
To know which ASIC version of LSI controller is running:
#lspci | grep RAID
13:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
How to check FW version
#./MegaCli64 -AdpAllInfo -a0 | grep “FW Package Build”
FW Package Build: 12.12.0-0178
Learn cycles on Exadata are default configured as follows:
Storage Cells with image 184.108.40.206.x the learn cycle occurs monthly from first power on.
Storage Cells with image 220.127.116.11.1 or later, the learn cycle is manually scheduled quarterly.
Database nodes are set for automatic scheduled, which occurs every 30 days from first power on.
To change the start time on Storage Cells for when the learn cycle occurs, use a command similar to the following. The time reverts to the default learn cycle time after the cycle completes:
CellCLI> ALTER CELL bbuLearnCycleTime=”2011-01-22T02:00:00-08:00″
How to know bbuLearnCycleTime:
CellCLI> list cell attributes bbuLearnCycleTime
# ./MegaCli64 -AdpBbuCmd -a0 | grep “Learn Cycle”
Learn Cycle Requested : No
Learn Cycle Active : No
Learn Cycle Status : OK
Learn Cycle Timeout : No
Battery Charge Condition Requirements & Replacement Guidelines
The absolute minimum BBU07 charge required to meet the minimum 48 hours hold-up time is 600mAh.
When the BBU07 can no longer hold this much charge, MegaCli64 will report this with the “Remaining Capacity Low” setting will change from the normal “No” to “Yes”which may be an early warning notice to check whether its “Full Charge Capacity” is getting low.
The absolute minimum BBU08 charge required to meet the minimum 48 hours hold-up time is 674mAh.
Note, on BBU08 this may be flagged prematurely due to a firmware bug (Sun CR 7018730) that incorrectly sets the value higher at 960mAh based on incorrect operational assumptions. If this is being flagged due to this bug, ignore the alert if the “Full Charge Capacity” value is over 800mAh.
# ./MegaCli64 -AdpBbuCmd -a0 | grep “Remaining Capacity Low”
Remaining Capacity Low : Yes
# ./MegaCli64 -AdpBbuCmd -a0 | grep “Capacity: ”
Remaining Capacity: 597 mAh
Full Charge Capacity: 612 mAh
Design Capacity: 1215 mAh
In this condition, the BBU can no longer support the cache for the duration required and needs replacement immediately. All virtual drives on a system with this set will be forced into WT mode to protect data until it is replaced, reducing performance until then.
Another parameter that needs to be checked is Max Error. Max Error is a reading that determines whether the reading of the battery condition is accurate or not. An error limit of <10% is considered to be a valid condition reading. If it is greater than it should be considered the battery condition cannot be reported properly and the BBU unit is treated as failed.
Check the battery status and replace battery if the full charge capacity after learn cycle is less than 600 mAh
1. Check the battery status and replace battery if the full charge capacity after learn cycle is less than 674 mAh, regardless of any other BBU output field.
2. Check the battery status and replace battery if the Max Error rate reported is 10% or greater.
Battery proactive replacement recommended within the next 60 days guidelines are as follows:
1. Replace Battery Module after 3 Year service life assuming the battery temperature does not exceed 45C.
If the temperature exceed 45C (Battery temp shall not exceed 49C), replace the battery every 2 years.
1. Replace Battery Module after 3 Year service life assuming the battery temperature
does not exceed 45C. If the temperature exceed 45C (battery temp shall not exceed 55C), replace the battery every 2 years.
# ./MegaCli64 -AdpBbuCmd -a0 | grep “Temperature”
Temperature: 47 C
Temperature : High
Over Temperature : Yes
The virtual drive on this DB node is currently in WT and will remain so until the temperature drops and the BBU resumes charging.
# ./MegaCli64 -LDInfo -LALL -aALL | grep “Cache Policy”
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
Disk Cache Policy: Disabled