6 Jun
2009
6 Jun
'09
8:14 a.m.
furaisanjin wrote:
I just wonder why this condiction exists in the patch.
if (*s&0xe2 == 0xe2) {
There are characters which start from 0xe5 in Japanese.
Ah yes, that is an error. Thank you. Any problems with "if (*s&0xe0 == 0xe0) {" instead? I believe that would force everything above U+0800 through the decoding, but oh well. (unicode blocks for the curious: http://unicode.org/Public/UNIDATA/Blocks.txt) Oh, by the way, I noticed in the line breaking document that they also listed 20000..2A6D6CJK UNIFIED IDEOGRAPHS EXTENSION B 2F800..2FA1DCJK COMPATIBILITY IDEOGRAPHS SUPPLEMENT but I wasn't sure whether anyone makes any real use of characters above U+FFFF.