Join the Conversation

To sign in, use your existing MySonicWall account. To create a free MySonicWall account click "Register".

NSA 4600 hangs when multiple users login

EspenEspen Newbie ✭
edited October 2020 in Mid Range Firewalls

Hi, my company has about 550 users who need to login through Sonicwall NSA4600 to get access to internet. The Sonicwall syncs with our local AD.

But when more than lets say 20-30 users login at the same time the sonicwall hangs. Login-page is not responding and as an admin we cant login either. If the users login 10 by 10 or less it goes good but more than 20 it stalls and hangs completely when more than 30 tries at the same time. We have tried updated it and got some support earlier but when the man from support tried to restart it in the middle of working hours and users tried to log back on it stopped so we needed to stop all users for the rest of the day to get the sonicwall back up again.

And when i say hangs, its only the login page. Those who already got login has internet no problem. And we can login by SSH to the sonicwall.

Isnt the NSA4600 suppose to handle this many users? Is there anything we can do? We arent high-end IT-technician but we know our way around a network so we got help to set it up the first time.

Category: Mid Range Firewalls
Reply

Answers

  • Hello @Espen,

    Could you please let us know the firmware version of the firewall?

    Please make sure that you are on the latest version 6.5.4.7-83n.

    While you have access through SSH, kindly run the commands

    diag show cpu

    diag show process <name_of_process>

    With diag show cpu, it will show you the CPU utilization with the highly utilized process on the top. Use that process name in the second command.

    If the CPU is going high, this will tell what exactly is causing that. The support should be able to dig in further and let you know the complete RCA.

    Thanks!

    Shipra Sahu

    Technical Support Advisor, Premier Services

  • EspenEspen Newbie ✭

    Thansk for responding.

    Yes, its running the latest version, isnt that long ago it was updated. However this problem has been with us for the last 1,5 year but we've managed to ride though it somehow anyway.

    Below is the commands you requested me running. I dont know what this tells but hopefully you can read something into it...


    admin@C0EAE4F69E> diag show cpu

    CPU Monitor:

    Current 1s CPU Utilization: 100.00%


    Current 10s CPU Utilization: 100.00%

    Total Average CPU Utilization: 6.41%

    Current MultiCore Utilization (%)

    Core 0: 100

    Core 1: 8

    Core 2: 4

    Core 3: 21

    Core 4: 16

    admin@C0EAE4F69E> diag show process pass_to_stack

    Process pass_to_stack (0x8fe061f0):

    pass_to_st> 80142a10 8fe061f0 50 PEND 818c8ec8 8fe060f0 18 0

    $0 = 0 t0 = 8c00 s0 = 50008ca1

    at = fffffffffffffffe t1 = ffffffffffff00ff s1 = ffffffffffffffff

    v0 = 0 t2 = 2000000 s2 = ffffffff85bbd0c8

    v1 = 318d t3 = ffffffff8fe06230 s3 = 1

    a0 = 50008ca1 t4 = 190 s4 = ffffffff839a0000

    a1 = ffffffff87b090e0 t5 = 80 s5 = 2f836cb0

    a2 = 0 t6 = d740 s6 = ffffffff82312978

    a3 = 400 t7 = ffffffff9e385834 s7 = a

    s8 = 1 k0 = 0 intctrl = 0

    tlbhi = 20000000 gp = ffffffff840d25c0

    k1 = 0 t8 = 1 ra = ffffffff817ba564

    sp = ffffffff8fe060f0 t9 = ffffffff818bae50 divlo = cac083126e978d52

    divhi = 1 sr = 50008ca1 pc = 818c8ec8

    Stack trace of pass_to_stack:

    0x8182075c -> ($13)

    0x80142a54 -> 0x818ca548

    0x818c8ec8Core 5: 11

    Core 6: 9

    Core 7: 8

    CPU Utilization Per Process:

    # Name PC PRI Total% (secs) Curr% (secs)

    --- ----------------- ---------- --- ------------- -------------

    1. pass_to_stack 0x818c8ec8 50 0.48 (6173.15) 16.13 (0.17)

    2. tWebRdrct03 0x8168e290 52 0.35 (4507.67) 14.52 (0.15)

    3. tWebRdrct02 0x8168fcc8 52 0.35 (4505.63) 14.52 (0.15)

    4. tWebRdrct01 0x8168fe6c 52 0.35 (4499.38) 11.29 (0.12)

    5. tWebRdrct05 0x818980f0 52 0.35 (4502.93) 9.68 (0.10)

    6. tWebListen 0x818c8ec8 50 0.16 (2030.15) 9.68 (0.10)

    7. tWebRdrct04 0x80f4d44c 52 0.35 (4504.43) 6.45 (0.07)

    --snip--

    Task Total 4.84 (62866.50) 100.00 (1.03)

    Idle 93.59 (1214736.25) 0.00 (0.00)

    System 1.57 (20313.93) 0.00 (0.00)

    CPU Utilization History for Last Minute (60 seconds ago --> now):

    100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,97,100,100,100,100,100,100,100,10

    CPU Utilization History for Last Hour (60 minutes ago --> now):

    27,2,7,20,23,37,23,35,28,23,100,100,100,100,100,60,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100

    CPU Utilization History for Last Day (24 hours ago --> now):

    100,100,100,100,100,100,100,100,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,100

    CPU Utilization History for Last Month (30 days ago --> now):

    0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100,100,100,100,100,0,0,100




    admin@C0EAE4F69E> diag show process pass_to_stack

    Process pass_to_stack (0x8fe061f0):

    pass_to_st> 80142a10 8fe061f0 50 PEND 818c8ec8 8fe060f0 18 0

    $0 = 0 t0 = 8c00 s0 = 50008ca1

    at = fffffffffffffffe t1 = ffffffffffff00ff s1 = ffffffffffffffff

    v0 = 0 t2 = 2000000 s2 = ffffffff85bbd0c8

    v1 = 318d t3 = ffffffff8fe06230 s3 = 1

    a0 = 50008ca1 t4 = 190 s4 = ffffffff839a0000

    a1 = ffffffff87b090e0 t5 = 80 s5 = 2f836cb0

    a2 = 0 t6 = d740 s6 = ffffffff82312978

    a3 = 400 t7 = ffffffff9e385834 s7 = a

    s8 = 1 k0 = 0 intctrl = 0

    tlbhi = 20000000 gp = ffffffff840d25c0

    k1 = 0 t8 = 1 ra = ffffffff817ba564

    sp = ffffffff8fe060f0 t9 = ffffffff818bae50 divlo = cac083126e978d52

    divhi = 1 sr = 50008ca1 pc = 818c8ec8

    Stack trace of pass_to_stack:

    0x8182075c -> ($13)

    0x80142a54 -> 0x818ca548

    0x818c8ec8

  • @Espen,

    The 6.5.4.7-83n was released just a few days ago, so please make sure you are on that version. I see something similar reported on 6.5.4.4 version. Also, I think the problem could be with tWebRdrctxx tasks.

    We would need the TSR, tracelogs taken during the hang to further analyze it. It would be best to have a support case created for thorough analysis.

    Thanks!

    Shipra Sahu

    Technical Support Advisor, Premier Services

  • EspenEspen Newbie ✭

    Im sorry, we arent on the latest version. I was sure we was but we are on 6.5.4.6-79n, and i see now that 6.5.4.7-83n  is the latest. I will upgrade as soon as i can.

    Im googling for answers so i got this page: https://www.sonicwall.com/support/knowledge-base/troubleshooting-firewall-reboots-due-to-twebmain-process-in-gen-6-devices/170504822896540/

    I might try the tips in that as well as upgrading. Any last pointers? And thanks for your help this far

  • @Espen,

    No problem. In your case it doesn't look like the tWebMain process.

    I hope those commands were run during the time of the issue. Anyway, it is best to be on 6.5.4.7 version. Please take all necessary backups before the firmware upgrade.

    I hope this fixes your problem.

    Thanks!

    Shipra Sahu

    Technical Support Advisor, Premier Services

  • AjishlalAjishlal Community Legend ✭✭✭✭✭

    Hi @Espen

    I had face the same issue on 4600 and i did below steps to resolve the CPU spike & It was related to the Firewall logs.

    1) Disabled the Logging for App control Globally & enable specific categories wise if really need to monitor.

    2) Changed the Logging level to "Notice"

    After done the above change please wait some time to see the change.

    Step -1

    Step 2


  • EspenEspen Newbie ✭

    I did try this and i it didnt do much. We arent using app control anyway. The traffic did go down a bit later in the week so we lived through it. Now is the same again. The top-cpu-processes are: CPU Utilization Per Process:


     # Name        PC      PRI  Total% (secs) Curr% (secs)

     --- ----------------- ----------  ---  ------------- -------------

     1.    tWebRdrct04 0x80f4d44c  52   0.52 (9539.60)  17.19 (0.18)

     2.    tWebRdrct01 0x8168fe7c  52   0.52 (9515.73)  17.19 (0.18)

     3.    tWebRdrct02 0x8168fe1c  52   0.52 (9535.87)  12.50 (0.13)

     4.   pass_to_stack 0x818c8ec8  50   0.56 (10192.43)  10.94 (0.12)

     5.    tWebRdrct05 0x8168fcf8  52   0.52 (9540.80)  10.94 (0.12)

     6.    tWebRdrct03 0x8168dfa8  52   0.52 (9538.80)  9.38 (0.10)

     7.  REAL_tDataPlane 0x818c8ec8  50   0.12 (2211.42)  9.38 (0.10)


    This time i couldnt connect to ssh at first either, it timed out.

    Any help?

  • EspenEspen Newbie ✭

    It also seems to lose contact with our local DNS-server when this happens. I try to ping it through ssh when im logged into Sonicwall. (Ive edited out the real ip)

    admin@C0EAE4F69E> ping 10.xx.0.xx

    Unable to resolve 10.xx.0.xx

    admin@C0EAE4F69E> ping 10.xx.0.xx

    Unable to resolve 10.xx.0.xx

    admin@C0EAE4F69E> ping 10.xx.0.xx

    10.xx.0.xx [10.xx.0.xx] : is alive

    Ping time : 0 ms

    admin@C0EAE4F69E> ping 10.xx.0.xx

    Unable to resolve 10.xx.0.xx

    admin@C0EAE4F69E> ping 10.xx.0.xx

    Unable to resolve 10.xx.0.xx


    I get connection like 1/8 times, how can i check if this is a part of the problem?

  • AjishlalAjishlal Community Legend ✭✭✭✭✭

    Hi @Espen

    In this case please do the firmware update and let us know as per @shiprasahu93

  • EspenEspen Newbie ✭

    The problem is the same after firmware-update.

  • AjishlalAjishlal Community Legend ✭✭✭✭✭


    Hi @Espen

    Please follow the below KB & try it.

    Make sure your network dont have any network switch loop.


  • AlbertoAlberto Enthusiast ✭✭
    Check configuration of ACL. Check if you have something that redirect to login page all connection if user are not authenticaded
  • KyleLKyleL Newbie ✭

    I'm running into the same problem on the same device. Were you able to find a resolution?

  • AjishlalAjishlal Community Legend ✭✭✭✭✭

    Hi @KyleL ,

    Please make sure your LDAP Referrals configured properly & while 4600 get slow while user login time, please do the LDAP test connection on the same time.

    NB: If you don't have multiple subdomains or multiple LDAP server other than the primary, Please disable the highlighted enabled field and try.

    Then check the LDAP connectivity & user authentication test.

    Make sure your LDAP server support LDAP version 3. LDAP version 2 having issues.

  • EspenEspen Newbie ✭
    edited December 2020

    No solution yet, sonicwall support has been working on the case for 2 weeks now.

  • MicahMicah SonicWall Employee

    Hello @Espen and @KyleL. I'm sorry to hear about this inconvenience. Please, if you PM your case number to me I can work with our internal teams. Let me know.

    Kind Regards,

    @micah - SonicWall's Self-Service Sr. Manager

Sign In or Register to comment.