I was just looking at how the is* functions treat characters with the
high bit set. If I set LC_CTYPE to en_US, 0xC0 is alpha, but it
understandably isn't in en_US.UTF-8.
So I found my way to
/usr/share/i18n/locales/ en_US -> en_GB -> i18n
And found that:
upper 41-5A, C0-D6, D8-DE
lower 61-7A, B5, DF-F6, F8-FF
alpha 41-5A, 61-7A, AA, B5, BA, C0-D6, D8-F6, F8-FF
digit 30-39
space 09-0D, 20
cntrl 00-1F, 7F-9F
punct 21-2F, 3A-40, 5B-60, 7B-7E, A0-A9, AB-B4, B6-B9, BB-BF, D7, F7
graph 21-7E, A0-FF
print 20-7E, A0-FF
xdigit 30-39, 41-46, 61-66
blank 9, 20
In dillo:
isupper not used
islower not used
isalpha cssparser, html
isdigit doesn't matter
isalnum cssparser, html, datauri, downloads, dlib.h
dIsalnum url
isspace cssparser, html, keys, dlib.h, findtext.hh
dIsspace colors, src/cookies, misc, dpi/cookies, file, ftp, dlib
iscntrl keys, auth, misc
ispunct textblock
isgraph not used
isprint dlib
isxdigit doesn't matter
isblank not used
isascii not used
I'm not sure that any real harm is done in these cases, but it might not
be a bad idea to check that it's ascii when we only want ascii.