Implementing </div> closing <span>, and friends
Hello all, [Sorry if this is a dup; gmail may be playing tricks on me.] This page displays, with distortions, in dillo 3.0.4, but the image does not display after commit 317f010: https://medium.com/@verynicetweets/getting-out-of-the-house-is-so-hard-these... A partially cleaned-up and stripped version is attached, where the comment block shows a suppression of the problem. The page is illegal: two unclosed <span>s interfere with a </div> and the following <div>. There is code around html.cc:1355, which is intended to implement heuristics to close unclosed tags but, in the cases where it is triggered, no tags are closed. Here is a patch which I think implements the intent of the heuristics: ----------------- $ hg diff diff -r 26378f85c4ea src/html.cc --- a/src/html.cc Sun Sep 14 09:56:22 2014 +0000 +++ b/src/html.cc Mon Sep 15 18:13:21 2014 +1200 @@ -1355,7 +1355,9 @@ i_SELECT = a_Html_tag_index("select"), i_TEXTAREA = a_Html_tag_index("textarea"); int w3c_mode = !prefs.w3c_plus_heuristics; - int stack_idx, tag_idx, matched = 0, expected = 0; + // TODO: If there was a clean way of never having a tag index zero, + // (it's currently <a>), then expected_idx could be prettier. + int stack_idx, tag_idx, matched = 0, expected_idx = -1, skipped = 0; TagInfo new_tag = Tags[new_idx]; /* Look for the candidate tag to close */ @@ -1368,26 +1370,42 @@ break; } else if (Tags[tag_idx].EndTag == 'O') { /* skip an optional tag */ + ++skipped; continue; } else if ((new_idx == i_BUTTON && html->InFlags & IN_BUTTON) || (new_idx == i_SELECT && html->InFlags & IN_SELECT) || (new_idx == i_TEXTAREA && html->InFlags & IN_TEXTAREA)) { /* let these elements close tags inside them */ + ++skipped; continue; } else if (w3c_mode || Tags[tag_idx].TagLevel >= new_tag.TagLevel) { /* this is the tag that should have been closed */ - expected = 1; - break; + /* skip it as if it were optional, but complain */ + ++skipped; + if (-1 == expected_idx) { + // ie this is the first thing we're unwinding + expected_idx = tag_idx; + } + continue; } } + // Either or both matched and expected_idx could now be set + + // TODO: decide whether to carry line numbers in the parser state stack, + // in order to report them here. + if (matched) { + if (-1 != expected_idx) { + BUG_MSG("Unexpected closing tag: </%s> -- expected </%s>", + new_tag.name, Tags[expected_idx].name); + } + Html_tag_cleanup_to_idx(html, stack_idx); - } else if (expected) { - BUG_MSG("Unexpected closing tag: </%s> -- expected </%s>.", - new_tag.name, Tags[tag_idx].name); } else { - BUG_MSG("Unexpected closing tag: </%s>.", new_tag.name); + BUG_MSG("Unexpected closing tag: </%s>" + " -- explored %d layer(s) and not closed.", + new_tag.name, skipped); } } ----------------- Closing tags that match no opening tag (I used </area>, which is illegal) do not cause catastrophic failure. So far, it has not broken on my usual workload. Gmail is noisy, so I wonder if something's going wrong there, but it looks fairly normal. I have no good idea what a general test suite for this code looks like. If someone wants to point me toward related tests, then I will think about extending them. Regards, James.
participants (1)
-
james.from.wellington@gmail.com