Hi Team.
We are running FreeSWITCH and facing a recurring issue where calls get stuck and stop routing. The only way to recover is a full FreeSWITCH restart. Looking for help to find a permanent fix.
Our Setup
FreeSWITCH media servers: freeswitch-media-1 and freeswitch-media-2
MySQL database on separate server
All servers on Hetzner cloud 12 core RAM 16GB
Inbound call volume: ~50 calls/second at peak
sessions-per-second currently set to 100
max-sessions set to 2000
The Problem
After FreeSWITCH runs for 20–40 minutes under normal traffic, inbound calls start getting stuck in CS_ROUTING state and never progress to CS_EXECUTE. They just sit there in RINGING callstate with no outbound leg (b_uuid is empty).
Example from show calls:
uuid: db49de02...
direction: inbound
state: CS_ROUTING
callstate: RINGING
application: (empty)
b_uuid: (empty)
Working calls look like this (have application and b_uuid):
uuid: 64004433...
direction: inbound
state: CS_EXECUTE
application: bridge
callstate: ACTIVE
b_uuid: be8bfe23...
What We Have Checked
CPU — only 15–20% used, not the issue
RAM — 23GB total, only 4GB used, not the issue
MySQL — DB is healthy, no locked queries, queries execute in 0ms
Network — servers can reach each other fine
max-sessions — set to 2000, not being hit
Disk — SSD, no I/O issues
What We Found
-
The percentage in FreeSWITCH logs climbs over time:
94.27% [DEBUG] ... CS_ROUTING
95.40% [DEBUG] ... CS_ROUTING
97.03% [DEBUG] ... CS_ROUTING
97.73% [DEBUG] ... CS_ROUTING
This is session usage % climbing toward max-sessions, meaning calls are piling up faster than they are being processed.
-
No routing timeout exists anywhere:
Calls stuck in CS_ROUTING never get killed automatically. They sit there forever until restart. hupall and uuid_kill do NOT work on these calls because they have no channel handle yet — they are queued in the dialplan processor.
-
Duplicate retries make it worse:
Because calls get no answer, callers retry. We see the same caller hitting the system 3–4 times within 30 seconds, multiplying the load.
Our switch.conf.xml relevant params:
xml<param name="max-sessions" value="2000"/>
<param name="sessions-per-second" value="100"/>
<param name="max-db-handles" value="500"/>
<param name="db-handle-timeout" value="30"/>
What We Have Tried
hupall NORMAL_CLEARING — does not clear stuck calls
uuid_kill — does not work on CS_ROUTING calls
fsctl sps 20 — helps slightly but issue still recurs
Cron watchdog script — fs_cli hangs when FreeSWITCH is degraded, script never completes
systemctl restart freeswitch — fixes it temporarily, but issue returns in 20–40 mins
We have also tried below patch by Jakub Karolczyk:
#2619
Hi Team.
We are running FreeSWITCH and facing a recurring issue where calls get stuck and stop routing. The only way to recover is a full FreeSWITCH restart. Looking for help to find a permanent fix.
Our Setup
FreeSWITCH media servers: freeswitch-media-1 and freeswitch-media-2
MySQL database on separate server
All servers on Hetzner cloud 12 core RAM 16GB
Inbound call volume: ~50 calls/second at peak
sessions-per-second currently set to 100
max-sessions set to 2000
The Problem
After FreeSWITCH runs for 20–40 minutes under normal traffic, inbound calls start getting stuck in CS_ROUTING state and never progress to CS_EXECUTE. They just sit there in RINGING callstate with no outbound leg (b_uuid is empty).
Example from show calls:
uuid: db49de02...
direction: inbound
state: CS_ROUTING
callstate: RINGING
application: (empty)
b_uuid: (empty)
Working calls look like this (have application and b_uuid):
uuid: 64004433...
direction: inbound
state: CS_EXECUTE
application: bridge
callstate: ACTIVE
b_uuid: be8bfe23...
What We Have Checked
CPU — only 15–20% used, not the issue
RAM — 23GB total, only 4GB used, not the issue
MySQL — DB is healthy, no locked queries, queries execute in 0ms
Network — servers can reach each other fine
max-sessions — set to 2000, not being hit
Disk — SSD, no I/O issues
What We Found
The percentage in FreeSWITCH logs climbs over time:
94.27% [DEBUG] ... CS_ROUTING
95.40% [DEBUG] ... CS_ROUTING
97.03% [DEBUG] ... CS_ROUTING
97.73% [DEBUG] ... CS_ROUTING
This is session usage % climbing toward max-sessions, meaning calls are piling up faster than they are being processed.
No routing timeout exists anywhere:
Calls stuck in CS_ROUTING never get killed automatically. They sit there forever until restart. hupall and uuid_kill do NOT work on these calls because they have no channel handle yet — they are queued in the dialplan processor.
Duplicate retries make it worse:
Because calls get no answer, callers retry. We see the same caller hitting the system 3–4 times within 30 seconds, multiplying the load.
Our switch.conf.xml relevant params:
What We Have Tried
hupall NORMAL_CLEARING — does not clear stuck calls
uuid_kill — does not work on CS_ROUTING calls
fsctl sps 20 — helps slightly but issue still recurs
Cron watchdog script — fs_cli hangs when FreeSWITCH is degraded, script never completes
systemctl restart freeswitch — fixes it temporarily, but issue returns in 20–40 mins
We have also tried below patch by Jakub Karolczyk:
#2619