TZ600 w multiple ISPs on multiple interfaces - one interface wont respond to ping
TZ600 HA pair, 4 ISPs, one AT&T dedicated fiber interface has stopped responding to ping. This interface (x2) can send a ping out successfully, and in packet monitor we can see the packets ingress but they egress out a different interface (x3). Working with SW support we checked and double checked the interfaces (ping is enabled there), NAT rules, Access rules, Address Objects, and all looks correct. Also checked Audit logs for any unknown changes. Pings to the other 3 interfaces all work correctly - ie; they come in on an interface and go back out the same interface.
x3 is the current active interface in the failover group, but this should not be a factor - nd isn't for pings sent to the other interfaces.
After almost 2 hours w support, we have no solution.
Any ideas are welcome.
@emmotto it's very hard to analyze from remote, but first things which come into my mind are there any overlapping subnets on the WAN interfaces or do all 4 have their own discrete subnet?
Did you checked your routing table (dynamic routing involved?) custom and default routes, maybe it's caused by some routing priority? I would expect it here, because of your X2:in / X3:out situation.
I have a couple of deployments with 4+ WAN connections and never experienced a general problem.
I have this problem on a client and until today not able to discover the eh.
actually i stopped analyzing
yes all interfaces are separate IPs the only ones that start with even the same triplet are x1 and x5. With Sonic support we checked NATs, Access rules, Routes 2 or 3 times and nothing leapt out at us. This x2 interface has been pingable since 2018. For this behavior to happen all of a sudden is puzzling. Tomorrow is the weekend and my plan is to try a couple of things I don't do during biz days - 1. promote the x2 interface to be primary in the failover group, 2. force failover from this physical unit (HA pair) to the other physical box, 3. in the diag.html page disable IP Spoof protection. Will do them all one at a time and check for ping.
@emmotto do I get this correct, when you ping the IP of X2 from the Internet you can see the "echo request" arriving on X2 but the "echo reply" leave the appliance on X3 according to the Packet Monitor?
Did you double checked your Network Routes (sorted by Priority, All Types? It really sounds like there is a route that matches causing sending the reply via X3. FLB was always last resort and your SRC:X2-IP destined to X2-DefaultGW should definitly match before that.
Best of luck for your tests, this looks like a headscratcher.
sorry - I'm doing something wrong when I use "quote" so just pasting your comment here.
>>do I get this correct, when you ping the IP of X2 from the Internet you can see the "echo request" arriving on X2 but the "echo reply" leave the appliance on X3 according to the Packet Monitor?
YES. And with a decent SonicWall tech/support engineer in remote session with me we double and triple checked routes, NATs, and firewall rules - 2 sets of eyes and did not see anything that would cause ping on x2 to be replied to on x3.
You are correct - so far it is a real head scratcher - especially as it was working before with no changes. I am remote to the firewall location and I use 2 big 42" monitors so I have enough screen real estate to have a bunch of continuous pings running to each of the 4 interfaces whenever I am concerned about connectivity or power issues at the remote location - so this has been working for YEARS.
See below - top 2 lines are the x2 interface getting a ping and replying on x3. The next 3 lines are a successful ping to x5 - not sure why you get 3 lines with success and 2 lines when it fails I suppose that could be part of the issue.
x2 starts with 12, x5 starts with 50. - my address begins with 69.
Alright - have noticed an anomaly in the route policies
x5 has 3 policies
x3 has 4 polices
x2 has 3 policies but 2 are greyed out (not enabled)
x1 has 3 policies
All policies are auto added and I can't re-enable the 2 disabled policies and don't understand how they were disabled in the first place.
The greyed out polices go from SRC=any, DEST=x2subnet, GWY=0.0.0.0 and SRC=x2ip, DEST=any, GWY=x2defaultgwy
@emmotto that's probably the explanation and my gibberish might put you in the right direction. IMHO you need 3 Default Rules to get an Interface properly working:
IMHO Route #2 might be negligible, but #1 and #3 is a must and the Priority of these rules have to be lower (which is better) than another matching Route. Route #3 is missing in your case, which is responsible for the echo-reply of the ping to X2 IP.
In the Route -> Settings you did not changed the Priority handling and left this option unticked?
Prioritize routes by metric within route classes
You do not have this option enabled on Internal Settings by accident?
Never generate an interface-specific default route
Keep on hunting.
Prioritize routes by metric within route classes = NOT selected
Never generate an interface-specific default route = NOT selected
So yes, I think this is the problem as well - but I don't understand why the two x2 routes are greyed out.
below default (auto added) routes 11 & 12 are greyed out.
all x2 routes have lower priority than x3 routes
@emmotto is X3 configured in your Failover & Load Balancing Group? If yes, did you tried to remove it?
What about manually adding the "default" routes again, will this work until SonicWall Support figures out why they are disabled?
Did you logged in via ssh on the appliance and had a look at "show route-policies" to see if there is any out of the ordinary?
It should look like this:
I can reproduce the disabled routes when putting a Interface in the FLB Group and the probe is not successful, then 2 of 3 routes are disabled. So it's FLB related I guess.
Thanks for the quick reply, Michael.
x3 is included in my failover/load balancing group and it is actually the primary. There has been no effort to remove it.
I just tried changing probing on x2 to physical only and it had no effect, so switched it back to logical. It says target alive.
I ssh'ed in and did a show route-policies and the x2 and x3 routes look exactly like yours. Nothing out of the ordinary.
I will try some further debugging tomorrow with Sonic support.
Thanks so much for your input.
@emmotto did you saw my Update which I squeezed in? Does X2 has a LB Status "Available", if not this might result in your disabled Routes.
I just saw that question - the status is "available"
@emmotto then I have to throw the towel at this point, I'am somewhat 100% certain that it's FLB related, if possible for testing purposes remove X2 from FLB completly to see if this brings the routes back up.
It might be a bug. When all LB States are marked as Available for all Interfaces then no route policy should be disabled.
Have a great rest of the weekend despite this mess.
BTW - I left things alone and got distracted by other tasks - but today I thought about it again and just for grins I pinged the problem interface - AND IT WORKED!
No changes were made to the SW - somehow this appears to have been an ISP issue - and it fixed itself as mysteriously as it appeared.