Skip to content

Avoid undefined behavior with ctype functions#69

Merged
QuLogic merged 2 commits into
pidgin:mainfrom
grimmy:ctype
Dec 2, 2025
Merged

Avoid undefined behavior with ctype functions#69
QuLogic merged 2 commits into
pidgin:mainfrom
grimmy:ctype

Conversation

@grimmy

@grimmy grimmy commented Nov 15, 2025

Copy link
Copy Markdown
Member

The ctype(3) isspace, isdigit, ... functions take an int argument that must be in the range [0-255] or EOF. Passing it a signed char results in undefined behaviour.

Some implementations of libc, such as glibc as of 2018, attempt to avoid the worst of the undefined behavior by defining the functions to work for all integer inputs representable by either unsigned char or char.

On NetBSD, ctype(3) functions will crash with a SIGSEGV signal on invalid inputs as a diagnostic aid for applications.

This set of patches ensures the char is first converted to unsigned char to ensure that the values are within the correct range.

Anthony Mallet and others added 2 commits November 15, 2025 01:09
…t must be in

the range [0-255] or EOF. Passing it a signed char results in undefined behaviour.

Some implementations of libc, such as glibc as of 2018, attempt to avoid
the worst of the undefined behavior by defining the functions to work for
all integer inputs representable by either unsigned char or char.

On NetBSD, ctype(3) functions will crash with a SIGSEGV signal on invalid inputs
as a diagnostic aid for applications.

This set of patches ensures the char is first converted to unsigned char to ensure
that the values are within the correct range.
@EionRobb

Copy link
Copy Markdown
Contributor

I find this PR a little confusing, as the man page says the values should be signed ints? https://man.freebsd.org/cgi/man.cgi?query=ctype&sektion=3

Would it be safer to use the g_ascii_X() versions of these functions, eg https://docs.gtk.org/glib/func.ascii_isalnum.html (although that doc also mentions unsigned char casting) which would also avoid SEGV if the value was outside expected ranges?

@tho-

tho- commented Nov 15, 2025

Copy link
Copy Markdown

Values of type char or signed char must first be cast to unsigned char, to ensure that the values are within the correct range. Casting a negative-valued char or signed char directly to int will produce a negative-valued int, which will be outside the range of allowed values (unless it happens to be equal to EOF, but even that would not give the desired result).

Would it be safer to use the g_ascii_X()

Those functions are documented to not take the locale into account, which may lead to subtle differences. This might work, but should be assesed.

@QuLogic QuLogic merged commit 5b18060 into pidgin:main Dec 2, 2025
8 checks passed
@grimmy grimmy deleted the ctype branch December 2, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants