On Sun, Jun 05, 2016 at 11:34:13PM +0200, Sebastian Geerken wrote:
Hi!
I've stripped down one testcase to a sequence of simple HTML snippets. Try
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"><span></div></div>'; done) > tmp.html; dillo tmp.html
and the development version of dillo hangs for a while. You may vary the number (second argument of seq).
If you look at the file tmp.html, you'll notice that it is incorrect HTML. Interrestinly, leaving the <span> away still results in incorrect and deeply nested HTML, but dillo is much faster:
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div></div>'; done) > tmp.html; dillo tmp.html
Dillo 3.0.5 is fast in both cases.
Does anyone have an idea what effect the <span> has?
I've already run gprof, but the result does not look very meaningful at a first glance.
OK, after some experiments I see what happens. If you close the SPAN element, it's fast again. The problem lies in how the parser handles bad HTML, and how the rendering deals with it afterwards. In this case the span is left open, and so we end with an anomalous tree where an inline container has 20 levels of block containers inside inline containers. You can surely imagine the mess textblock and OOFM get trapped-in when trying to make sense of it all! :) Good news, I already have a working patch. It needs some testing because it constitutes a big change in how we deal with bad-formed HTML, but so far it makes more sense than what we have now. If you need the patch quick just drop me a note. HTH. -- Cheers Jorge.-