NSV200 Azure vpn to NSa/Tz, traffic randomly stops passing over tunnel interface
Hi,
Just wondering if anyone else in the community has experienced this (I have a ticket logged with Sonicwall :) )
I have this issue on 4 totally different sites now where I have an NSv200 in azure and a NSa/TZ on prem.
NSv200 (azure) - NSa2650
NSv200 (azure) - NSa2600
NSv200 (azure) - TZ500
NSv200 (azure) - TZ300 (6.5.4.4-44n)
All other firewalls above are on latest firmware, NSv_200__Azure_-6_5_4_4-44v-21-987 and 6.5.4.7-83n on all NSa/TZ
Though the issue was also present on NSv200 nsv_azure_6.5.4.4-44v-21-987 and NSv_200__Azure_-6_5_0_2-8v-37-628 with NSa_2650-6_5_4_5-53n.
The setup is a route based/tunnel interface VPN IKEv2 between Azure NSv200 and on prem NSa/TZ.
The problem is that out of nowhere, maybe once a week, maybe 2 weeks, maybe in a month the tunnel stays up/green on both sides but traffic does not pass over it. I can see from the traffic logs that the NSv200 is not trying to push traffic into the tunnel (no traffic appears in the packet capture for the interesting traffic on the tunnel even though I know for sure there is a lot of traffic trying) and I cannot see traffic inbound from the other site on the tunnel, on the NSa/TZ side it is trying to push the traffic into the tunnel ( I can see this in the packet captures).
Once I bounce the tunnel it all works again.
The error on packet captures is:
On NSv
[in:X1*(interface),out:--,DROPPED, Drop Code: 425(Octeon Decrypyion Failed Pad check), Module Id: 20(ipSec), (Ref.Id: _775_jqtfdPdufpoJoqvu),1:1)]
On Nsa/TZ
[in:X1*(interface),out:--,DROPPED, Drop Code: 680(Packet dropped - fails to handle IPSec pkt), Module Id: 20(ipSec), (Ref.Id: _2926_txGsIboemfJqQlu),1:1)]
My first ticket with SonicWALL they suggested disabling keep alives on the NSa/TZ side and leaving on for the NSv side - this did not work
I have also changed key life times (though the issue does not get triggered at a point when the key lifetimes expire anyway)
I have also tried different combinations of DPD.
Irregardless of my changes above there is no reason why this should happen, this is a sonicwall appliance to another sonicwall appliance, the tunnels work fine for weeks, sometimes months and then randomly (sometimes at 03:00am when all is quiet and just Active directory traffic is passing with very low bandwidth consumption - sometimes at 15:00pm - sometimes at 10:00am) - There is not common trigger.
As I said, this is on all 4 sites for different customers where I have a tunnel interface IKEv2 vpn between NSv200 azure and an on prem NSa/TZ
I am currently testing one site by changing the VPN to policy based (site to site) with IKEv2.
Another site by changing the tunnel interface (route) based VPN to IKEv1
Another site by disabling IPsec anti-replay and same tunnel interface (route) based VPN (ikev2)
I have an NSv200 with a VPN to another vendor firewall (route based/tunnel interface and IKEv2)and absolutely no issue.
The latest suggestion from sonicwall support, after asking to them to escalate, was to rebuild a VPN and recreate network objects....
My guess on this one is that the NSv is not activating the route for the tunnel though it is active in the config.
Best Answer
-
MasterRoshi Moderator
@RedNet, try disabling "Send IKEv2 Invalid SPI Notify" on both firewalls then restart the tunnel and monitor it. That is the only issue I know about that presents similar behavior. If this stops the issue from happening, then there are hotfixes available for you.
I would also suggest you setup some sort of syslog server and enable debug logs for IPSEC/VPN to be sent there so that the support team has more information when a trigger happens.
1
Answers
just also to add, if I dont bounce the vpn tunnel it never starts working again (even when the phase 1 and 2 lifetimes expire). I can leave it broken for a week and it wont start working until I manually bounce the vpn tunnel.
Hello @RedNet ,
I'm sorry to hear about this inconvenience, Can you please try the steps suggested by MASTERROSHI and if you are still facing the issue, please PM your case number so that we can follow up with our support team internally.
Regards
Karan
Knowledge Management Senior Analyst at SonicWall.
Thanks for that, its a suggestion which makes sense and you have obviously taken the time to read and understand my issue.. having this issue (with my customers looking for answers) and not being able to speak to a vendor support member who can actually understand the problem is a really frustrating position to be in.
I have applied your suggestion to 2 of the four customer sites where I have this issue. On one of the other sites I have switched to policy(site to site) and on the last site I have changed to IKEv1 (main mode).
The issue sometimes takes a week or two before it appears again, so I will let you know.
I had the syslogs to a collector but there was nothing interesting I could see in them, though I never know the exact time it happens as I only know the issue is there when my probes over the tunnel report down. When I check the logs around the time the probe flags as down I dont really see anything of note.
Thanks again for taking the time to look into this and give educated feedback.
Thanks for this, you have obviously taken the time to read and understand my problem which is a huge relief. Your suggestion makes sense and I have been thinking about this more since posting and do believe the issue must lie with IKEv2.
I have applied your suggestion to 2 of the 4 customer sites I am having this issue. On the other 2 sites I have changed one tunnel to policy/site to site mode instead of a tunnel vpn and on the other site I have left tunnel mode but changed to IKEv1 (main mode).
The issue takes sometimes a week or two for it to come back once I have bounced the vpn, so I will let you know.
It is incredibly disheartening and a major turn off from buying these appliances when you get issues and cannot get the vendor support to give you confidence in their troubleshooting responses that they even understand your problem, so your response is very much appreciated.
I had collected syslogs but I couldnt find any interesting messages in the logs which looked related at the time of issue. Though it was hard to pinpoint the exact time of issue, as I can only tell there is a problem when my probes report back as down... so it can be a few minutes after the issue arises but I still couldnt see any log msgs of note, the packet captures were the only useful detail I could see.
Thanks again and I will let you know how it goes.
Thanks @KaranM and I will.
@MasterRoshi so far so good on all 4 sites, so it looks like "Send IKEv2 Invalid SPI Notify" is the culprit here.
Though I will leave it for another couple of weeks, as we have had times where all is well for 3 weeks.
@MasterRoshi This seems to be it, no issues since on these sites, even with the IKEv2 tunnel with "send IKEv2 Invalid SPI Notify" disabled. Thanks!
Would you have the hotfix or Bug ID for this please, do you know if its marked to be fixed in any later release?
Gen6-995 and 6.5.4.8 should have the fix. You can ask for a hotfix from the support team @RedNet.