High CPU Utilization on NSa 4600
We have an HA pair of NSa4600 firewalls that are seeing 75-95% CPU utilization on cores 1-7 (or 2-8 depending on what monitor you're looking at). Both units are running 6.5.4.8-86n--HFGEN6-2470-1n. We have disabled IPS/IDS and GAV/SAV on multiple zones to try and reduce load, which doesn't seem to be making a difference. DPI-SSL is not enabled on this firewall, it is not doing any client VPN.
I pulled tracelogs, and I'm not seeing anything useful there. I checked the TSR for the byte buffer count, and it's not displayed on that firewall's TSR, even when all boxes are checked.
In the GUI I'm seeing Connections: Peak:298269 Current:35550 Max:375000
which makes it seem that we're not hitting a connection limit.
I'm using the diag show cpu and getting these results (snipped)
----------------
CPU Monitor:
Current 1s CPU Utilization: 3.08%
Current 10s CPU Utilization: 7.00%
Total Average CPU Utilization: 8.10%
Current MultiCore Utilization (%)
Core 0: 3
Core 1: 46
Core 2: 86
Core 3: 82
Core 4: 79
Core 5: 77
Core 6: 76
Core 7: 76
-------------
I'm seeing cores 1-7 hitting 46% to 86%, yet the 1s CPU utilization is 3.08%. We need to determine what services we can disable to reduce usage.
Questions:
- Beyond the core monitors how do we determine what processes are using the resources on data plane cores?
- Is there any way we can see the processes using cycles on the data plane cores?
- Where are resources for logging allocated from? Management or data plane?
- Any suggestions on additional things to look at to bring the CPU to a more reasonable level?
I do realize these units are undersized for the use case, and we are in the process of procuring replacements, but for now I need to keep these running.
Thanks
Scott
Answers
Any ideas here?
I'll take some shots... but if you haven't opened a ticket you should.
In diag.html page:
Tracelog:
Enable log the busiest task while cpu core is 100% for 1 seconds.
Diagnostics settings:
Number of jobs executed by data plance task to be tracked: 50
Enable include priority 254 task cpu usage.
Watchdog settings:
Report what the current task is doing if CPU is 100% for 1 seconds and enable print the trace at interrupt.
The above settings should get you some more info in the tracelog. I'd say just do them temporarily to assist in troubleshooting.
Guessing you've looked at processing times in TSR? Per core packet stats? Are you doing anything with AppControl? Realtime data collection\Appflow to local collector? Bandwidth management? Are your log settings anything but default?
Hello @shultis,
Please go through the following KB article:
If you are seeing DP core spikes, it is usually due to the real-time traffic passing through the firewall. Any management traffic including logging is handled by core 0 (control plane). If you have monitoring tools like Analytics/GMS, you should be able to see the IPs/type of traffic that could be causing this issue.
Thanks!
Shipra Sahu
Technical Support Advisor, Premier Services
Hi @shultis
Please check if the issue reported is similar to the below article,
High DataPlane cores utilization after upgrading sonicOS version to 6.5.4.8-89n | SonicWall
I have installed the hotfix and I am now getting random reboots on my NSA 4600 HA pair. Even the standby firewall is rebooting when it isn't even being used. One of the two devices reboots, on average, once a day. Twice, the crash/reboot process took several hours but usually it only takes a couple of minutes.
Support has taken my logs but hasn't responded for several days.
Per support request, I installed SonicOS 6.5.4.9 and the rebooting issues have not been fixed. Starting another case.
hi thanks for the update....the more i read about firmware 6.5.4.9 and 6.5.4.8 the more my guess is to not install them on all sonicwalls :-(