-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi :) I was playing around with Michal Zalewskis html fuzzer 'mangle' trying to crash dillo and succeeded :). Furthermore I was able to reduce the ~100k page that was triggering a NULL pointer dereferencing to three tags (see the attached page). Some digging around (html.cc is a monster ^^) revealed that style_attrs.setBorderColor ( Color::createShaded(HT2LT(html), style_attrs.backgroundColor->getColor())); in Html_tag_open_table was failing due to the fact that style_attrs.backgroundColor was NULL. Html_tag_open_select sets the backgroundColor to NULL and this is inherited by the table tag since it's a child of the select tag. Greping through the code shows that there are at least two other functions that set backgroundColor (and color) to NULL (Html_tag_open_input and Html_tag_open_isindex) and twelve locations calling getColor() on either color or backgroundColor. I have no idea how to _properly_ fix this issue. Furthermore I'd suggest to break down html.cc into several files. It is about 6000 lines long (-> hard to dig into) and it takes gcc roughly five seconds to compile on my box. Justus - -- gpg key fingerprint: C82D 382A AB38 1A54 5290 19D6 A0F9 B035 686C 6996 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIOUmmoPmwNWhsaZYRAprIAJ9X9HSzc4Isx+NyFWxlfSFPrwj/VACdF8GD x67scPvD6FCrOezl69NbaHA= =z0iv -----END PGP SIGNATURE-----
On Sun, May 25, 2008 at 01:12:38PM +0200, Justus Winter wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi :)
I was playing around with Michal Zalewskis html fuzzer 'mangle' trying to crash dillo and succeeded :).
Good. It's no surprise because dillo2 needs a careful review of behaviour when facing strange/malicious values. Obviously this phase was procrastinated until basic functionality was completed. For instance, most probably you'll get to crash dillo2 by passing it some negative values in attributes. The problem of being robust when parsing garbage or malicious code needs a general strategy. I was delighted to read DJB's paper: "Some thoughts on security after ten years of qmail 1.0" (Strongly suggested to everyone!). There're lots of good ideas there, tried and true. It's clear that implementing them all is a huge task, but just a couple of them can make wonders, and reading the paper makes you think in a different way when designing code. For instance, in dlib, I took the simple approach of checking each function's input before processing (not 100% complete, but you get the idea), and it has proven rock solid. The extra time spent on checks doesn't impact performance noticeably, but the simplicity and joy of relying on a strong ADT set is priceless. Another strategy that allowed me to simplify dillo a lot, was to avoid multiple execution paths. With multiple paths, code gets much more complex, and maintenance is even harder, minor changes have side effects to track and testing depends on the path taken. A single path is tested by common use all the time, has only one place to make changes, is simpler to understand/debug/modify, and the simplicity leverage clearly overcomes the possible performace tradeoff. Beautifuly expressed: "The key to performance is elegance, not battalions of special cases." Jon Bentley and Doug McIlroy e.g. The CCC used to have one-branch-per-protocol. When it was switched to a single "universal" branch, CCC code size was reduced importantly, a lot of the complexity vanished, fixing a bug benefited all protocols, there were no multiple paths to track, etc. e.g.2 Memory handling in dillo. The general idea of trying to make each function deallocate what it allocated, made a hell of memory traceback debugging vanish. And the performance impact was unnoticed!
Furthermore I was able to reduce the ~100k page that was triggering a NULL pointer dereferencing to three tags (see the attached page).
Some digging around (html.cc is a monster ^^) revealed that
style_attrs.setBorderColor ( Color::createShaded(HT2LT(html), style_attrs.backgroundColor->getColor()));
in Html_tag_open_table was failing due to the fact that style_attrs.backgroundColor was NULL.
Html_tag_open_select sets the backgroundColor to NULL and this is inherited by the table tag since it's a child of the select tag.
Greping through the code shows that there are at least two other functions that set backgroundColor (and color) to NULL (Html_tag_open_input and Html_tag_open_isindex) and twelve locations calling getColor() on either color or backgroundColor.
I have no idea how to _properly_ fix this issue.
Not me either. I'd check what NULL means. A quick look hints towards a non inherited value: /** * \brief Reset those style attributes to their standard values, which are * not inherited, according to CSS. */ void StyleAttrs::resetValues () A default backgroundColor should be a good place to start. HTH.
Furthermore I'd suggest to break down html.cc into several files. It is about 6000 lines long (-> hard to dig into) and it takes gcc roughly five seconds to compile on my box.
I agree. The hard part is a good choice on what to separate. ;) -- Cheers Jorge.-
On Sun, May 25, 2008 at 10:31:23AM -0400, Jorge Arellano Cid wrote:
On Sun, May 25, 2008 at 01:12:38PM +0200, Justus Winter wrote:
Furthermore I'd suggest to break down html.cc into several files.
The hard part is a good choice on what to separate. ;)
I suggest splitting out everything that handles forms and form elements. That's a fair amount of code that represents a reasonably self-contained chunk. The forms code can also be be cleaned up in a few ways. There's a lot of cut-and-paste code like "foo->get (foo->size () - 1)". Adding getLast() methods to some lout classes and using them wherever possible would remove a lot of clutter. html.cc looks like a C file that was subsequently converted to C++ because it makes little use of C++. If DilloHtmlForm and DilloHtmlInput were given constructors and destructors and managed with new and delete then lots of allocation/deallocation code would be neatly factored out instead of sitting inline with all the other logic. The type element of DilloHtmlInput leads to much horrible case code. If DilloHtmlInput were an abstract class and the different Input types were subclasses then all the type-dependent code could go into virtual methods. This would break up the huge case statements that make much of this code hard to follow. I'll propose patches if people are positive about these suggestions. Jeremy Henty
Jeremy wrote:
html.cc looks like a C file that was subsequently converted to C++ because it makes little use of C++. If DilloHtmlForm and DilloHtmlInput were given constructors and destructors and managed with new and delete then lots of allocation/deallocation code would be neatly factored out instead of sitting inline with all the other logic.
The type element of DilloHtmlInput leads to much horrible case code. If DilloHtmlInput were an abstract class and the different Input types were subclasses then all the type-dependent code could go into virtual methods. This would break up the huge case statements that make much of this code hard to follow.
It's at least definitely worth trying, I think.
On Sun, May 25, 2008 at 05:13:13PM +0100, Jeremy Henty wrote:
On Sun, May 25, 2008 at 10:31:23AM -0400, Jorge Arellano Cid wrote:
On Sun, May 25, 2008 at 01:12:38PM +0200, Justus Winter wrote:
Furthermore I'd suggest to break down html.cc into several files.
The hard part is a good choice on what to separate. ;)
I suggest splitting out everything that handles forms and form elements. That's a fair amount of code that represents a reasonably self-contained chunk.
The forms code can also be be cleaned up in a few ways.
It looks like you have a good set of ideas to start. Corvid has been working around forms lately so please try to coordinate ideas with him
There's a lot of cut-and-paste code like "foo->get (foo->size () - 1)". Adding getLast() methods to some lout classes and using them wherever possible would remove a lot of clutter.
OK.
html.cc looks like a C file that was subsequently converted to C++ because it makes little use of C++.
html.cc is a C file that was subsequently converted to C++! :-)
If DilloHtmlForm and > DilloHtmlInput were given constructors and destructors and managed with new and delete then lots of allocation/deallocation code would be neatly factored out instead of sitting inline with all the other logic.
OK 2.
The type element of DilloHtmlInput leads to much horrible case code. If DilloHtmlInput were an abstract class and the different Input types were subclasses then all the type-dependent code could go into virtual methods. This would break up the huge case statements that make much of this code hard to follow.
I'll propose patches if people are positive about these suggestions.
BTW:
[corvid wrote:] It's at least definitely worth trying, I think.
Just beware of not converting the "horrible case code" into "horrible virtual method subclassing". :-) FLTK2 does this quite nicely with the virtual handle() method. This is IMHO remarkable because the underlying event handling model is certainly complex, but the abstraction for it attains simplicity at the widget level. -- Cheers Jorge.-
On Mon, May 26, 2008 at 08:49:09AM -0400, Jorge Arellano Cid wrote:
Corvid has been working around forms lately so please try to coordinate ideas with him
OK.
There's a lot of cut-and-paste code like "foo->get (foo->size () - 1)". Adding getLast() methods to some lout classes and using them wherever possible would remove a lot of clutter.
OK.
OK, that's next on the TODO.
Just beware of not converting the "horrible case code" into "horrible virtual method subclassing". :-)
To be honest I don't think I can promise that it will be worth doing until I have a go, I am just optimistic enough to want to have a go.
FLTK2 does this quite nicely with the virtual handle() method.
Yes, that's a good model. I guess the analogy would be for the Input classes to have virtual update_form_from_html() and form_values() methods. There problem with creating Input objects as there is no such thing as a virtual constructor or virtual class method. So some case code might remain, but it would be hidden in the class API and the switch would simply select which derived class constructor to call. That's not so bad. Regards, Jeremy Henty
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jorge Arellano Cid wrote:
On Sun, May 25, 2008 at 01:12:38PM +0200, Justus Winter wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi :)
I was playing around with Michal Zalewskis html fuzzer 'mangle' trying to crash dillo and succeeded :).
Good.
It's no surprise because dillo2 needs a careful review of behaviour when facing strange/malicious values. Obviously this phase was procrastinated until basic functionality was completed.
For instance, most probably you'll get to crash dillo2 by passing it some negative values in attributes. The problem of being robust when parsing garbage or malicious code needs a general strategy.
Are you saying that it is too early to locate problems this way? The process of finding problems using a fuzzer and generating minimal testcases that trigger the problem is mostly automatic and I could script the last bits that do require manual intervention with ease. In case these bug reports are useful at this point, here is another one: In file html.cc, function Html_tag_close_select: int size = input->select->options->size (); fails since input->select is NULL. The html fragment that triggers this fault is attached. Justus - -- gpg key fingerprint: C82D 382A AB38 1A54 5290 19D6 A0F9 B035 686C 6996 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIOayFoPmwNWhsaZYRArZgAJ4hdzV2Gs2fVNRgblDASKmEmaF0gACfaKBT XpPVTVS8l4RITppTqprbAlM= =/sCO -----END PGP SIGNATURE-----
Justus wrote:
Jorge Arellano Cid wrote:
The problem of being robust when parsing garbage or malicious code needs a general strategy.
Certainly testing flags all the time is very inadequate when the spec has it as, for example, <!ELEMENT SELECT - - (OPTGROUP|OPTION)+ -- option selector --> (i.e., don't put anything in a select except optgroup or option)...
In case these bug reports are useful at this point, here is another one:
In file html.cc, function Html_tag_close_select:
int size = input->select->options->size ();
fails since input->select is NULL. The html fragment that triggers this fault is attached.
..but since flags are what there is at the moment, here's a patch.
On Sun, May 25, 2008 at 08:14:29PM +0200, Justus Winter wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jorge Arellano Cid wrote:
On Sun, May 25, 2008 at 01:12:38PM +0200, Justus Winter wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi :)
I was playing around with Michal Zalewskis html fuzzer 'mangle' trying to crash dillo and succeeded :).
Good.
It's no surprise because dillo2 needs a careful review of behaviour when facing strange/malicious values. Obviously this phase was procrastinated until basic functionality was completed.
For instance, most probably you'll get to crash dillo2 by passing it some negative values in attributes. The problem of being robust when parsing garbage or malicious code needs a general strategy.
Are you saying that it is too early to locate problems this way?
No! It was too early, now dillo2 has almost all dillo1 had and more. Now it's a good time. What I meant is that a general strategy for these cases is better than a case by case approach.
The process of finding problems using a fuzzer and generating minimal testcases that trigger the problem is mostly automatic and I could script the last bits that do require manual intervention with ease.
In case these bug reports are useful at this point, here is another one:
In file html.cc, function Html_tag_close_select:
int size = input->select->options->size ();
fails since input->select is NULL. The html fragment that triggers this fault is attached.
<HEAD><FORM><SELECT><TEXTAREA>
This is the same non-authorized element inside SELECT. It can be catched with the solution described in my last post. It'd be good to have the test cases. Go ahead. -- Cheers Jorge.-
Justus wrote:
I was playing around with Michal Zalewskis html fuzzer 'mangle' trying to crash dillo and succeeded :). Furthermore I was able to reduce the ~100k page that was triggering a NULL pointer dereferencing to three tags (see the attached page).
Some digging around (html.cc is a monster ^^) revealed that
style_attrs.setBorderColor ( Color::createShaded(HT2LT(html), style_attrs.backgroundColor->getColor()));
in Html_tag_open_table was failing due to the fact that style_attrs.backgroundColor was NULL.
Html_tag_open_select sets the backgroundColor to NULL and this is inherited by the table tag since it's a child of the select tag.
I doubt I realized you can try to put things like tables inside selects back when I wrote that. Maybe Jorge will know something easy and general that can be done based on parse modes to keep dillo from wasting its time calling things like Html_tag_open_table that aren't going to be displayed anyway, but the attached patch at least fixes the crash.
Greping through the code shows that there are at least two other functions that set backgroundColor (and color) to NULL (Html_tag_open_input and Html_tag_open_isindex) and twelve locations calling getColor() on either color or backgroundColor.
I think those are all right because they aren't allowed to contain anything, although, as I hinted above, my knowledge of the parsing code is not really detailed.
Furthermore I'd suggest to break down html.cc into several files. It is about 6000 lines long (-> hard to dig into) and it takes gcc roughly five seconds to compile on my box.
I'd like this as well.
On Sun, May 25, 2008 at 02:41:32PM +0000, corvid wrote:
Justus wrote:
I was playing around with Michal Zalewskis html fuzzer 'mangle' trying to crash dillo and succeeded :). Furthermore I was able to reduce the ~100k page that was triggering a NULL pointer dereferencing to three tags (see the attached page).
Some digging around (html.cc is a monster ^^) revealed that
style_attrs.setBorderColor ( Color::createShaded(HT2LT(html), style_attrs.backgroundColor->getColor()));
in Html_tag_open_table was failing due to the fact that style_attrs.backgroundColor was NULL.
Html_tag_open_select sets the backgroundColor to NULL and this is inherited by the table tag since it's a child of the select tag.
I doubt I realized you can try to put things like tables inside selects back when I wrote that.
Maybe Jorge will know something easy and general that can be done based on parse modes to keep dillo from wasting its time calling things like Html_tag_open_table that aren't going to be displayed anyway, but the attached patch at least fixes the crash.
SELECT can be closed in Html_stack_cleanup_at_open(). SELECT can contain OPTGROUP and OPTION. It would be good to consider the other cases before making a patch. -- Cheers Jorge.-
participants (4)
-
4winter@informatik.uni-hamburg.de
-
corvid@lavabit.com
-
jcid@dillo.org
-
onepoint@starurchin.org