SD-WAN malfunction
Dear friends, I have on Nsa3650 Os 6.5.4.11-97n Rom:SonicROM 5.7.1.7 with 3 different WAN connection from 3 different provider and technology FTTH, FTTC, UMTS connected directly to firewall at x1,x3,x3. The line it's work correctly and the provider don't found problem during 24 monitoring hours.
If you see the attached picture during the same time all 3 WAN have the same abnormal value (es. Latency pass from 20 tp 500 ms). This cause a problem on sd-wan route policies. (I have use also a configuration with 2 wan on groups and use a backup, the situation it's the same). The spike repeats itself several times during a few minutes.
I ask you if has anyone had similar problem and if they have found workaround?
Thanks
Answers
@PietroCeribelli I never experienced something like this, but because the problem is recurring could you check with the Core Monitor if there is a CPU spike which could cause the SD-WAN Probes to fail or acting strange?
It looks something system related to me, and CPU Core0 would be my first guess.
--Michael@BWC
Dear Michael thanks for your reply, The core and CPU it's normal, I have tried to change MTU value on interfaces but the problem it's the same.
@PietroCeribelli I would check for Core0 which is IMHO the Core where this SD-WAN checks operate. It's a bit confusing because on the Mulri-Core Monitor it starts with Core 1 but the Core Monitor (which you screenshoted) starts with 0. You need to monitor this Core load while SD-WAN freaks out. This might be hard to catch, I dunno how often it happens.
Probably Monitor -> Appliance Health -> Multi-Core Monitor is the best view for that. Or the Live Monitor (it's Core 1 in there).
--Michael@BWC
The firewall works correctly. I have make a screenshot
on spike the core it's normal.
@PietroCeribelli then I guess we can rule out temporary CPU spikes.
Anything else these interfaces have in common, using some form of WAN-Switch when you're in a HA situation?
Or did you tried to ping a different destination, to rule out it's 1.1.1.1?
--Michael@BWC
Yes it's no CPU spikes
The 3 WAN it's connected directly to firewall at port X1,x3,x3 without switches, or HA system
If you see on screen shoot I have 2 different probe one it's 1.1.1.1 (not used in the firewall) and other it's FQDN (mix-it.net). I have tried to change ip 1.1.1.1 with 8.8.8.8 same result.
Pietro
@PietroCeribelli then I'am out of ideas for the moment. Maybe someone else got a bright idea, at least we covered the basics.
--Michael@BWC
Thanks Michael for your support.
Pietro
In the spirit of Columbo "Just one more thing", if you do a ping from a system behind the firewall, these response times will spike as well?
--Michael@BWC
I will be able to answer this question on Monday. Good weekend
Pietro
Did this start after a recent firmware upgrade? Or is this a new install?
Your screenshot 'sd-wan-mulfunctin3' shows total usage on core 0 at ~63%, do you have Control Plane flood protection on?
Have you opened a support case? You have a tough one.