Hi, On Sun, May 10, 2026 at 03:25:07AM +0100, 256@256-32.com wrote:
On my personal website <https://256-32.com/>, Dillo reports the following bug in a link that uses special characters: HTML warning: line 64, URL has 8 illegal bytes in {00-1F, 7F-FF} range ('/computers/důvěřivý').
Yes, we follow the RFC 3986 and the HTML 4.01 recommendation of marking illegal characters outside the unreserved set: https://www.w3.org/TR/html401/appendix/notes.html#h-B.2 In HTML by the WHATWG they added exceptions for UTF-8 URLs, but I don't think is a good idea. This breaks software that doesn't handle UTF-8 URLs (i.e. anything that follows the RFC not what Google says). My recommendation is to encode the URL with percent encoding: https://256-32.com/computers/d%C5%AFv%C4%9B%C5%99iv%C3%BD This is the URL that is used in HTTP, despite not being rendered as-is in the URL location of modern browsers. One of the problems with that is the Unicode "confusables", characters that render very similar but are different, like this: https://аpple.com See: https://www.xudongz.com/blog/2017/idn-phishing/ Best, Rodrigo.