Strange performance behaviour related to <span>
Hi! I've stripped down one testcase to a sequence of simple HTML snippets. Try (for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"><span></div></div>'; done) > tmp.html; dillo tmp.html and the development version of dillo hangs for a while. You may vary the number (second argument of seq). If you look at the file tmp.html, you'll notice that it is incorrect HTML. Interrestinly, leaving the <span> away still results in incorrect and deeply nested HTML, but dillo is much faster: (for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div></div>'; done) > tmp.html; dillo tmp.html Dillo 3.0.5 is fast in both cases. Does anyone have an idea what effect the <span> has? I've already run gprof, but the result does not look very meaningful at a first glance. Sebastian
On Sun, Jun 05, 2016 at 11:34:13PM +0200, Sebastian Geerken wrote:
Hi!
I've stripped down one testcase to a sequence of simple HTML snippets. Try
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"><span></div></div>'; done) > tmp.html; dillo tmp.html
and the development version of dillo hangs for a while. You may vary the number (second argument of seq).
If you look at the file tmp.html, you'll notice that it is incorrect HTML. Interrestinly, leaving the <span> away still results in incorrect and deeply nested HTML, but dillo is much faster:
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div></div>'; done) > tmp.html; dillo tmp.html
Dillo 3.0.5 is fast in both cases.
Does anyone have an idea what effect the <span> has?
I've already run gprof, but the result does not look very meaningful at a first glance.
OK, after some experiments I see what happens. If you close the SPAN element, it's fast again. The problem lies in how the parser handles bad HTML, and how the rendering deals with it afterwards. In this case the span is left open, and so we end with an anomalous tree where an inline container has 20 levels of block containers inside inline containers. You can surely imagine the mess textblock and OOFM get trapped-in when trying to make sense of it all! :) Good news, I already have a working patch. It needs some testing because it constitutes a big change in how we deal with bad-formed HTML, but so far it makes more sense than what we have now. If you need the patch quick just drop me a note. HTH. -- Cheers Jorge.-
On Mo, Jun 06, 2016, Jorge Arellano Cid wrote:
On Sun, Jun 05, 2016 at 11:34:13PM +0200, Sebastian Geerken wrote:
Hi!
I've stripped down one testcase to a sequence of simple HTML snippets. Try
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"><span></div></div>'; done) > tmp.html; dillo tmp.html
and the development version of dillo hangs for a while. You may vary the number (second argument of seq).
If you look at the file tmp.html, you'll notice that it is incorrect HTML. Interrestinly, leaving the <span> away still results in incorrect and deeply nested HTML, but dillo is much faster:
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div></div>'; done) > tmp.html; dillo tmp.html
Dillo 3.0.5 is fast in both cases.
Does anyone have an idea what effect the <span> has?
I've already run gprof, but the result does not look very meaningful at a first glance.
OK, after some experiments I see what happens.
If you close the SPAN element, it's fast again.
The problem lies in how the parser handles bad HTML, and how the rendering deals with it afterwards.
In this case the span is left open, and so we end with an anomalous tree where an inline container has 20 levels of block containers inside inline containers.
I thought something like this.
You can surely imagine the mess textblock and OOFM get trapped-in when trying to make sense of it all! :)
Still, Dw should handle this. Look at this example: (for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div>'; done) > tmp.html; src/dillo tmp.html Here, only some <div>s at the end are open (and it is simple to make the snippet correct HTML), but it still takes much time. This looks still like a Dw problem, especially since it is much faster if you leave the float definition away.
Good news, I already have a working patch.
It needs some testing because it constitutes a big change in how we deal with bad-formed HTML, but so far it makes more sense than what we have now.
If you need the patch quick just drop me a note.
Take your time. Sebastian
Hi Sebastian, Good news here, I have patches for both problems! On Mon, Jun 06, 2016 at 10:13:15PM +0200, Sebastian Geerken wrote:
On Mo, Jun 06, 2016, Jorge Arellano Cid wrote:
On Sun, Jun 05, 2016 at 11:34:13PM +0200, Sebastian Geerken wrote:
Hi!
I've stripped down one testcase to a sequence of simple HTML snippets. Try
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"><span></div></div>'; done) > tmp.html; dillo tmp.html
and the development version of dillo hangs for a while. You may vary the number (second argument of seq).
If you look at the file tmp.html, you'll notice that it is incorrect HTML. Interrestinly, leaving the <span> away still results in incorrect and deeply nested HTML, but dillo is much faster:
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div></div>'; done) > tmp.html; dillo tmp.html
Dillo 3.0.5 is fast in both cases.
Does anyone have an idea what effect the <span> has?
I've already run gprof, but the result does not look very meaningful at a first glance.
OK, after some experiments I see what happens.
If you close the SPAN element, it's fast again.
The problem lies in how the parser handles bad HTML, and how the rendering deals with it afterwards.
In this case the span is left open, and so we end with an anomalous tree where an inline container has 20 levels of block containers inside inline containers.
I thought something like this.
I'm currently testing a patch for the nesting problem in the parser, and so far it works very well. I expect to commit it next week.
You can surely imagine the mess textblock and OOFM get trapped-in when trying to make sense of it all! :)
Still, Dw should handle this. Look at this example:
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div>'; done) > tmp.html; src/dillo tmp.html
Here, only some <div>s at the end are open (and it is simple to make the snippet correct HTML), but it still takes much time. This looks still like a Dw problem, especially since it is much faster if you leave the float definition away.
Yes, I see. There's indeed a problem in Dw. After some inspired reflections on the exponential nature of the time taken, and the corresponding tests&experiments, I got to a simple patch that solves all the cases we've seen so far in this thread! It makes all cpu hog cases render as fast as expected, and even makes some "unrelated" sites I visit render twice faster or so. I'm not postulating this to be *the* correct solution, but is a strong hint as to what is wrong. I didn't want to mess with mustQueueResize subtleties, so I'm more than happy with this one-liner: diff -r bcf30ff0896c dw/textblock.cc --- a/dw/textblock.cc Thu Jun 16 16:27:56 2016 -0400 +++ b/dw/textblock.cc Thu Jun 16 23:21:40 2016 -0400 @@ -3034,7 +3034,8 @@ void Textblock::queueDrawRange (int inde void Textblock::updateReference (int ref) { - queueResize (ref, false); + if (lines->size ()) + queueResize (ref, false); } HTH.
Good news, I already have a working patch.
It needs some testing because it constitutes a big change in how we deal with bad-formed HTML, but so far it makes more sense than what we have now.
If you need the patch quick just drop me a note.
Take your time.
Big patch here, as stated above, probably committed next week. -- Cheers Jorge.-
Hi Jorge,
Still, Dw should handle this. Look at this example:
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div>'; done) > tmp.html; src/dillo tmp.html
Here, only some <div>s at the end are open (and it is simple to make the snippet correct HTML), but it still takes much time. This looks still like a Dw problem, especially since it is much faster if you leave the float definition away.
Yes, I see.
There's indeed a problem in Dw.
After some inspired reflections on the exponential nature of the time taken, and the corresponding tests&experiments, I got to a simple patch that solves all the cases we've seen so far in this thread!
It makes all cpu hog cases render as fast as expected, and even makes some "unrelated" sites I visit render twice faster or so.
I'm not postulating this to be *the* correct solution, but is a strong hint as to what is wrong. I didn't want to mess with mustQueueResize subtleties, so I'm more than happy with this one-liner:
diff -r bcf30ff0896c dw/textblock.cc --- a/dw/textblock.cc Thu Jun 16 16:27:56 2016 -0400 +++ b/dw/textblock.cc Thu Jun 16 23:21:40 2016 -0400 @@ -3034,7 +3034,8 @@ void Textblock::queueDrawRange (int inde
void Textblock::updateReference (int ref) { - queueResize (ref, false); + if (lines->size ()) + queueResize (ref, false); }
HTH.
Committed, thanks for the good work! I've looked at this problem, too, and found out that the interaction between queueResize and markSizeChange was not perfect. I've not yet examined how it works now, but the performance problems seem to be solved (including the original case), so I'll assign a lower priority to this. Sebastian
On Fri, Jun 17, 2016 at 11:10:24AM +0200, Sebastian Geerken wrote:
Hi Jorge,
Still, Dw should handle this. Look at this example:
(for i in $(seq 1 20); do echo '<div style="float:left"><div></div><div style="display:table"></div>'; done) > tmp.html; src/dillo tmp.html
Here, only some <div>s at the end are open (and it is simple to make the snippet correct HTML), but it still takes much time. This looks still like a Dw problem, especially since it is much faster if you leave the float definition away.
Yes, I see.
There's indeed a problem in Dw.
After some inspired reflections on the exponential nature of the time taken, and the corresponding tests&experiments, I got to a simple patch that solves all the cases we've seen so far in this thread!
It makes all cpu hog cases render as fast as expected, and even makes some "unrelated" sites I visit render twice faster or so.
I'm not postulating this to be *the* correct solution, but is a strong hint as to what is wrong. I didn't want to mess with mustQueueResize subtleties, so I'm more than happy with this one-liner:
diff -r bcf30ff0896c dw/textblock.cc --- a/dw/textblock.cc Thu Jun 16 16:27:56 2016 -0400 +++ b/dw/textblock.cc Thu Jun 16 23:21:40 2016 -0400 @@ -3034,7 +3034,8 @@ void Textblock::queueDrawRange (int inde
void Textblock::updateReference (int ref) { - queueResize (ref, false); + if (lines->size ()) + queueResize (ref, false); }
HTH.
Committed, thanks for the good work!
Great! BTW, I've found quite useful to include a brief description of the patch and test cases in the hg comment, that way when I do an "hg blame" and try to find out why a certain line of code is there it's really helpful, and much easier to find than in a mail thread! As hg has no way to amend a published patch, adding a single whitespace char, and the extended comment as a new patch did the trick. This comment is easily readable with "hg log -v" -- Cheers Jorge.-
participants (2)
-
jcid@dillo.org
-
sgeerken@dillo.org