Skip to content

FreeSWITCH Calls Stuck in CS_ROUTING Under Load — Requires Restart #3013

@DhruvilInextrix

Description

@DhruvilInextrix

Hi Team.
We are running FreeSWITCH and facing a recurring issue where calls get stuck and stop routing. The only way to recover is a full FreeSWITCH restart. Looking for help to find a permanent fix.

Our Setup

FreeSWITCH media servers: freeswitch-media-1 and freeswitch-media-2
MySQL database on separate server
All servers on Hetzner cloud 12 core RAM 16GB
Inbound call volume: ~50 calls/second at peak
sessions-per-second currently set to 100
max-sessions set to 2000

The Problem
After FreeSWITCH runs for 20–40 minutes under normal traffic, inbound calls start getting stuck in CS_ROUTING state and never progress to CS_EXECUTE. They just sit there in RINGING callstate with no outbound leg (b_uuid is empty).
Example from show calls:
uuid: db49de02...
direction: inbound
state: CS_ROUTING
callstate: RINGING
application: (empty)
b_uuid: (empty)
Working calls look like this (have application and b_uuid):
uuid: 64004433...
direction: inbound
state: CS_EXECUTE
application: bridge
callstate: ACTIVE
b_uuid: be8bfe23...

What We Have Checked
CPU — only 15–20% used, not the issue
RAM — 23GB total, only 4GB used, not the issue
MySQL — DB is healthy, no locked queries, queries execute in 0ms
Network — servers can reach each other fine
max-sessions — set to 2000, not being hit
Disk — SSD, no I/O issues

What We Found

  1. The percentage in FreeSWITCH logs climbs over time:
    94.27% [DEBUG] ... CS_ROUTING
    95.40% [DEBUG] ... CS_ROUTING
    97.03% [DEBUG] ... CS_ROUTING
    97.73% [DEBUG] ... CS_ROUTING
    This is session usage % climbing toward max-sessions, meaning calls are piling up faster than they are being processed.

  2. No routing timeout exists anywhere:
    Calls stuck in CS_ROUTING never get killed automatically. They sit there forever until restart. hupall and uuid_kill do NOT work on these calls because they have no channel handle yet — they are queued in the dialplan processor.

  3. Duplicate retries make it worse:
    Because calls get no answer, callers retry. We see the same caller hitting the system 3–4 times within 30 seconds, multiplying the load.

Our switch.conf.xml relevant params:

xml<param name="max-sessions" value="2000"/>
<param name="sessions-per-second" value="100"/>
<param name="max-db-handles" value="500"/>
<param name="db-handle-timeout" value="30"/>

What We Have Tried
hupall NORMAL_CLEARING — does not clear stuck calls
uuid_kill — does not work on CS_ROUTING calls
fsctl sps 20 — helps slightly but issue still recurs
Cron watchdog script — fs_cli hangs when FreeSWITCH is degraded, script never completes
systemctl restart freeswitch — fixes it temporarily, but issue returns in 20–40 mins

We have also tried below patch by Jakub Karolczyk:
#2619

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions