I was just looking at how the is* functions treat characters with the high bit set. If I set LC_CTYPE to en_US, 0xC0 is alpha, but it understandably isn't in en_US.UTF-8. So I found my way to /usr/share/i18n/locales/ en_US -> en_GB -> i18n And found that: upper 41-5A, C0-D6, D8-DE lower 61-7A, B5, DF-F6, F8-FF alpha 41-5A, 61-7A, AA, B5, BA, C0-D6, D8-F6, F8-FF digit 30-39 space 09-0D, 20 cntrl 00-1F, 7F-9F punct 21-2F, 3A-40, 5B-60, 7B-7E, A0-A9, AB-B4, B6-B9, BB-BF, D7, F7 graph 21-7E, A0-FF print 20-7E, A0-FF xdigit 30-39, 41-46, 61-66 blank 9, 20 In dillo: isupper not used islower not used isalpha cssparser, html isdigit doesn't matter isalnum cssparser, html, datauri, downloads, dlib.h dIsalnum url isspace cssparser, html, keys, dlib.h, findtext.hh dIsspace colors, src/cookies, misc, dpi/cookies, file, ftp, dlib iscntrl keys, auth, misc ispunct textblock isgraph not used isprint dlib isxdigit doesn't matter isblank not used isascii not used I'm not sure that any real harm is done in these cases, but it might not be a bad idea to check that it's ascii when we only want ascii.
On Wed, Nov 09, 2011 at 05:14:23AM +0000, corvid wrote:
I was just looking at how the is* functions treat characters with the high bit set. If I set LC_CTYPE to en_US, 0xC0 is alpha, but it understandably isn't in en_US.UTF-8.
So I found my way to /usr/share/i18n/locales/ en_US -> en_GB -> i18n
And found that: upper 41-5A, C0-D6, D8-DE lower 61-7A, B5, DF-F6, F8-FF alpha 41-5A, 61-7A, AA, B5, BA, C0-D6, D8-F6, F8-FF digit 30-39 space 09-0D, 20 cntrl 00-1F, 7F-9F punct 21-2F, 3A-40, 5B-60, 7B-7E, A0-A9, AB-B4, B6-B9, BB-BF, D7, F7 graph 21-7E, A0-FF print 20-7E, A0-FF xdigit 30-39, 41-46, 61-66 blank 9, 20
In dillo: isupper not used islower not used isalpha cssparser, html isdigit doesn't matter isalnum cssparser, html, datauri, downloads, dlib.h dIsalnum url isspace cssparser, html, keys, dlib.h, findtext.hh dIsspace colors, src/cookies, misc, dpi/cookies, file, ftp, dlib iscntrl keys, auth, misc ispunct textblock isgraph not used isprint dlib isxdigit doesn't matter isblank not used isascii not used
I'm not sure that any real harm is done in these cases, but it might not be a bad idea to check that it's ascii when we only want ascii.
Sure. I could imagine you already regret that you started looking into this stuff :)
participants (2)
-
corvid@lavabit.com
-
Johannes.Hofmann@gmx.de