Issue with TZ350, stopped routing NAT for all IP's other than the IP applied to the WAN interface
I decided to do a performance test on our SQL database, currently hosted on GCP. I fired up a mysqlslap and it was going but slow, it was pushing between 10-15MB throughput on the firewall and then I killed the test because GCP SQL is dog slow. Then all the sudden, everything went down at our site except I was still connected to the VPN that ties to the outside interface of the Sonic TZ350 we are using. Everything else has additional IP's associated to them for NAT and are not using the IP directly on the outside interface of the FW.
So nothing was routing into our site all because I put a 15MB load on our FW? I checked and I was able to use a LB IP and hit our services no problem on their private IP's while connected to the VPN... So I tried creating a new NAT xlate, thinking maybe it would work or get the others working, no go, new NAT didnt work, IP's not being responded to by the TZ350 if it isnt directly assigned to the interface.
Finally after 15 painful minutes of downtime and running out of time to collect forensics and troubleshoot, we rebooted the Firewall. After the reboot, everything came back online, everything was reachable.
This is obviously a horrible user experience and unusable from an enterprise perspective and has us very concerned on our choice of firewalls. We need to find the source and resolve it, surely the TZ350 that can handle 100'sMB of encrypted traffic wouldnt die with 15 MB load. I also have disabled all appflow/netflow and SSL inspection, the load is usually around 20% on the firewall.
Anyone else have this happen, anyone have any ideas what might be the cause or how we can prevent it from happening again?
TZ350
Firmware 6.5.4.1-25n
safemode vers 6.2.3.12
rom 5.6.2.2
-s
Best Answers
-
Arkwright All-Knowing Sage ✭✭✭✭
No advice about your specific scenario beyond "it should work", but I would note that is pretty old firmware [March 2019], so whatever this issue is may be fixed by now.
EDIT: Looks like 6.5.4.1 was the initial firmware version for TZ350, so sounds like yours has never been updated.
1 -
EncryptShawn Newbie ✭
Yeah we are migrating to a couple of fortinets, this firewall should not be sold to anyone as usable for production, it has been a nightmare for our production environment. Even though we bought with limited manufacturers warranty, Sonicwall wants us to pay for a support contract to have them tell us this FW broken or that it sucks and try sell us one that is more expensive that maybe sucks less. Either way, we arent paying for a service contract on a product that cannot perform its rudimentary function without failing, digging the hole of sonicwall losses deeper.
Guess this sonicwall will just become an over priced VPN concentrator since we already paid for a VPN license upgrade.
Definitely recommend anyone looking at sonicwall firewalls for a production environment to re-evaluate and reconsider their decision.
0
Answers
actually, this is almost certainly it. I dont know how I was under the impression we were on the lastest code. Your second set of eyes was helpful, thanks much.
-s
We are now on the latest firmware for the TZ 350 : 6.5.4.10-95n
Well unfortunately firmware update didnt fix it. Earlier I was doing some remote SQL dumps and sql workbench failed and managed to keep using bandwidth and cranked it up to about 80mb on a 100mb internet connection, still way under this firewalls published capabilities. I ended up rebooting the machine I had sql workbench on that was downloading the data and then when I did, all of our sites went down other than for the VPN to the firewall. Everything in the firewall reported that everything was fine, packet captures showed packets being forwarded, rules hit counts were incrementing, but nothing was reachable from the outside world other than the VPN on the IP assigned to the WAN interface. From inside, I could still make connections out to SQL and websites with no issue, the only problem appears to be with inbound traffic, maybe nat halfway died, maybe responses to inbound traffic being dropped and not reported, I dont know?
At this point I question if anyone is able to use this tz350 sonic firewall pushing 50MB+ of bandwidth consistently without this happening.
Any recommendations on how to handle this, is there a known issue and fix?
If not, I dont see how we have any choice but to buy another firewall ASAP and replace this SonicWall before it screws us again. I have worked with firewalls for at least 15 years and I dont think I have ever had anything like this before, especially on something with 10 firmware revisions. Its not even just the $600 we lost buying this firewall and then replacing it with something else, its the lost data, customer confidence and the so many hours of labor to change firewalls at this point.
Right now I can say buying and building on a sonicwall firewall might have been one of the worst engineering decisions of my IT career.
Any useful advice for troubleshooting or resolving this issue is appreciated
I would start by disabling security services to see if the extra load of processing this traffic with the services on is pushing it over the edge. However, in my experience, a firewall nearing its CPU limit doesn't cause everything to fall over, it merely degrades the performance. So I think there might be something else going on here but I am not sure what.