Aggressive Mode VPNs aren't staying up
We have 10 TZ3xx series firewalls for remote workers. Each connects to a Palo Alto at one location to access company resources, and to a Palo Alto at a second location so the network team can manage the TZs remotely. We're not seeing issues on the first connection, because there is constant traffic between the remote workstation and the domain controllers.
The issue is with the link to the second location for management. There is no traffic between PA and the remote sonicwalls on this VPN until we need to manage one of them.
-- All of the TZs are behind a natted router, (home internet connection)
-- All of the TZs are configured for Aggressive Mode VPN
-- All of the TZs have "keep alive" selected for the VPN
-- All of the TZs have the VPN configured as a "Tunnel Interface" with a static route configured
-- Phase 1 and Phase 2 IKE lifetimes are 28800 seconds on all fo the TZs and on the PAs
-- Access to the remote sonicwall drops within minutes if there's no traffic between the firewalls
-- Generating traffic from the remote site (having the end user ping) will immediately reestablish the VPN
I submitted a ticket, and support said, the TZ we tested on was 1 rev. behind, and needed to be on the latest firmware before we could continue troubleshooting. Fair enough... I upgraded 2 of the remote TZs, and we're seeing the same symptoms.
My understanding is that the firewall in aggressive mode is responsible for maintaining the VPN, and the "Keep Alive" setting is specifically designed to send a small amount of traffic to maintain the VPN.
Since I'm seeing the same symptoms on different TZs, across 3 versions (including the lastest) of the firmware, it seems like a configuration issue, or a bug, rather than an issue with a specific TZ
The support rep said we should review the Palo Alto configuration for the VPNs. But, if the firewall in aggressive mode is responsible for maintaining the connection, and a simple ping will reestablish it, I don't see how it's possible that the Palo Alto (or any other firewall) could be responsible for the connection dropping.
The other question I was asked, is can't you just generate some traffic when you need to access the remote firewall? Is there another way to get to them, through the other VPN?
The answer is yes to both questions, (or the TZs would have already been dumped in the bin), but that's not the point. The firewall is supposed to be doing something that it isn't doing.
I'm hoping maybe someone here knows of a way to force aggressive mode with keep alive to actually keep the connection alive, or tell me how i"m being stupid, and not doing it right. Either would be welcome! :)
1) Make sure that inbound traffic to UDP ports 500 [IKE], 4500 [NAT-T], and IP 50 [ESP] on the customer gateway Modem to the Sonicwall.
2) Try to enable the DPD in sonicwall unit and adjust the time (Sec).
3) Since your scenario having the double NAT, Try to disable the "Enable NAT Traversal" from sonicwall TZ3xx.
Thanks for the response.
Is it necessary to have a NAT rule on the customer gateway? If the sonicwall is supposed to be initiating the connection, and does so when traffic is generated, why would the inbound UDP port be necessary. It's not an option for some of our end users. If the VPN connection can't stay up without making a change to the customer gateway, we need to find an alternate solution to the Sonicwalls. My understanding is that aggressive mode with keep alive is the work-around to a change to the customer gateway. Is that not the case?
I will investigate the DPD time. Is there any recommended settings to resolve the issue?
I will test disabling "Enable NAT Traversal" although, disabling seems counter-intuitive.
Any clarification you have would be appreciated.
I have updated the configuration on one of the TZs, (screenshot below)
DPD was already enabled
-- disabled the "enable NAT traversal" setting
-- reduced the DPD setting from 600 seconds to 100 seconds
I let the connection sit between each change and the connection is still dropping, and a ping from the remote machine always brings it back up instantly.
Additionally, the VPN is showing "up", (green dot) on the Sonicwall, and the tunnel is showing "up" on the Palo Alto. If the VPN was actually dropping, the status should show "down".
I have adjusted the DPD setting from 100 to 60, which is the minimum setting, and the firewall is inaccessible after a few minutes, even though the VPN is showing "up".
I've also noticed there is no logging for this connection. The TZ in question has 3 VPN connections, 2 are showing up in the logs, and the one is question is not. Not sure if that's related, but it's odd.