Avoid undefined behavior with ctype functions#69
Conversation
…t must be in the range [0-255] or EOF. Passing it a signed char results in undefined behaviour. Some implementations of libc, such as glibc as of 2018, attempt to avoid the worst of the undefined behavior by defining the functions to work for all integer inputs representable by either unsigned char or char. On NetBSD, ctype(3) functions will crash with a SIGSEGV signal on invalid inputs as a diagnostic aid for applications. This set of patches ensures the char is first converted to unsigned char to ensure that the values are within the correct range.
|
I find this PR a little confusing, as the man page says the values should be Would it be safer to use the |
|
Values of type char or signed char must first be cast to unsigned char, to ensure that the values are within the correct range. Casting a negative-valued char or signed char directly to int will produce a negative-valued int, which will be outside the range of allowed values (unless it happens to be equal to EOF, but even that would not give the desired result).
Those functions are documented to not take the locale into account, which may lead to subtle differences. This might work, but should be assesed. |
The ctype(3) isspace, isdigit, ... functions take an int argument that must be in the range [0-255] or EOF. Passing it a signed char results in undefined behaviour.
Some implementations of libc, such as glibc as of 2018, attempt to avoid the worst of the undefined behavior by defining the functions to work for all integer inputs representable by either unsigned char or char.
On NetBSD, ctype(3) functions will crash with a SIGSEGV signal on invalid inputs as a diagnostic aid for applications.
This set of patches ensures the char is first converted to unsigned char to ensure that the values are within the correct range.