Re: css parsing numbers

newer
improving CSS performance

older
minor leak with CustShrinkTabPager

Johannes.Hofmann＠gmx.de

Feb. 27, 2009

7:14 p.m.

On Fri, Feb 27, 2009 at 03:29:55PM +0000, corvid wrote:

...

...
On Fri, Feb 27, 2009 at 02:26:03PM +0000, corvid wrote:

...
...
Committed. I hoped to be able to make that code a bit shorter, but could not find a reasonable solution...

Yeah. In any case, it's just going to get worse/rearranged again for negative numbers...

Exactly. I started my simplification attempts, added negative number support - and blew it all up :) I'm really considering a flex based scanner now. What do you think?

I only touched flex very briefly for a class at school long ago, and I don't remember anything from the experience, but the idea interests me since it has to be less trouble than trying to do it by hand...

I think we just need to try it. Attached is a minimal standalone example based on the "Lexical scanner" from the appendix of the CSS 2.1 spec. To compile it use: flex cssscanner.l gcc -o cssscanner lex.yy.c -lfl It reads from stdin. Cheers, Johannes

Show replies by date

Johannes.Hofmann＠gmx.de

February 2009

12:30 a.m.

New subject: css parsing numbers

On Fri, Feb 27, 2009 at 07:06:27PM +0100, Hofmann Johannes wrote:

...

On Fri, Feb 27, 2009 at 03:29:55PM +0000, corvid wrote:

...
...
On Fri, Feb 27, 2009 at 02:26:03PM +0000, corvid wrote:

...
...
Committed. I hoped to be able to make that code a bit shorter, but could not find a reasonable solution...

Yeah. In any case, it's just going to get worse/rearranged again for negative numbers...

Exactly. I started my simplification attempts, added negative number support - and blew it all up :) I'm really considering a flex based scanner now. What do you think?

I only touched flex very briefly for a class at school long ago, and I don't remember anything from the experience, but the idea interests me since it has to be less trouble than trying to do it by hand...

This turned out to be easier than expected. Attached patch adds a flex based scanner for CSS data. It's not optimized or polished but seems to work here. Cheers, Johannes

Johannes.Hofmann＠gmx.de

March 2009

12:39 p.m.

New subject: css parsing numbers

On Sat, Feb 28, 2009 at 12:22:32AM +0100, Hofmann Johannes wrote:

...

On Fri, Feb 27, 2009 at 07:06:27PM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 03:29:55PM +0000, corvid wrote:

...
...
On Fri, Feb 27, 2009 at 02:26:03PM +0000, corvid wrote:

...
...
Committed. I hoped to be able to make that code a bit shorter, but could not find a reasonable solution...

Yeah. In any case, it's just going to get worse/rearranged again for negative numbers...

Exactly. I started my simplification attempts, added negative number support - and blew it all up :) I'm really considering a flex based scanner now. What do you think?

I only touched flex very briefly for a class at school long ago, and I don't remember anything from the experience, but the idea interests me since it has to be less trouble than trying to do it by hand...

This turned out to be easier than expected. Attached patch adds a flex based scanner for CSS data. It's not optimized or polished but seems to work here.

To make reviewing easier I created a repo for the flex experiment: http://freehg.org/u/dillo/flex/ I think the use of flex makes the code simpler and more maintainable. For some reason rendering is a bit different, e.g. on wikipedia.org maybe because negative numbers are supported. It would be nice if people could test it regarding the performance impact. What do you think. Is it worth to add the flex dependency? Cheers, Johannes

jcid＠dillo.org

5:06 p.m.

New subject: css parsing numbers

On Sun, Mar 01, 2009 at 12:31:18PM +0100, Hofmann Johannes wrote:

...

On Sat, Feb 28, 2009 at 12:22:32AM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 07:06:27PM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 03:29:55PM +0000, corvid wrote:

...
...
On Fri, Feb 27, 2009 at 02:26:03PM +0000, corvid wrote:

...
> Committed. I hoped to be able to make that code a bit shorter, but > could not find a reasonable solution...

Yeah. In any case, it's just going to get worse/rearranged again for negative numbers...

Exactly. I started my simplification attempts, added negative number support - and blew it all up :) I'm really considering a flex based scanner now. What do you think?

I only touched flex very briefly for a class at school long ago, and I don't remember anything from the experience, but the idea interests me since it has to be less trouble than trying to do it by hand...

This turned out to be easier than expected. Attached patch adds a flex based scanner for CSS data. It's not optimized or polished but seems to work here.

To make reviewing easier I created a repo for the flex experiment: http://freehg.org/u/dillo/flex/ I think the use of flex makes the code simpler and more maintainable. For some reason rendering is a bit different, e.g. on wikipedia.org maybe because negative numbers are supported. It would be nice if people could test it regarding the performance impact.

What do you think. Is it worth to add the flex dependency?

I'm a bit worried about the flex dependency. It's being a long time since I saw/used flex/bison/yacc & friends. AFAIR, there was a myriad of slightly incompatible flavours. It may be safer to generate a C-source parser with the tool, and to include it as a source file. AFAIU the patch feeds a CSS grammar to libflex, and it acts as an interpreter that parses. You should consider the performance, the complexity (against tunning our current parser), and if the dependency can be avoided. -- Cheers Jorge.-

Johannes.Hofmann＠gmx.de

5:28 p.m.

New subject: css parsing numbers

On Mon, Mar 02, 2009 at 01:06:52PM -0300, Jorge Arellano Cid wrote:

...

On Sun, Mar 01, 2009 at 12:31:18PM +0100, Hofmann Johannes wrote:

...
On Sat, Feb 28, 2009 at 12:22:32AM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 07:06:27PM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 03:29:55PM +0000, corvid wrote:

...
...
On Fri, Feb 27, 2009 at 02:26:03PM +0000, corvid wrote: > > Committed. I hoped to be able to make that code a bit shorter, but > > could not find a reasonable solution... > > Yeah. In any case, it's just going to get worse/rearranged again > for negative numbers...

Exactly. I started my simplification attempts, added negative number support - and blew it all up :) I'm really considering a flex based scanner now. What do you think?

I only touched flex very briefly for a class at school long ago, and I don't remember anything from the experience, but the idea interests me since it has to be less trouble than trying to do it by hand...

This turned out to be easier than expected. Attached patch adds a flex based scanner for CSS data. It's not optimized or polished but seems to work here.

To make reviewing easier I created a repo for the flex experiment: http://freehg.org/u/dillo/flex/ I think the use of flex makes the code simpler and more maintainable. For some reason rendering is a bit different, e.g. on wikipedia.org maybe because negative numbers are supported. It would be nice if people could test it regarding the performance impact.

What do you think. Is it worth to add the flex dependency?

I'm a bit worried about the flex dependency.

It's being a long time since I saw/used flex/bison/yacc & friends. AFAIR, there was a myriad of slightly incompatible flavours.

I think what the patch uses is pretty standard and automake takes care of the detection and correct parameters.

...

It may be safer to generate a C-source parser with the tool, and to include it as a source file.

The problem is that we also need to (statically) link libfl.a, so we actually need lex/flex installed on the build system.

...

AFAIU the patch feeds a CSS grammar to libflex, and it acts as an interpreter that parses.

In it's current form only the scanner/tokenizer is replaced by a flex-generated one. The parsing is still done using the hand-written code.

...

You should consider the performance, the complexity (against tunning our current parser), and if the dependency can be avoided.

Until recently Css_next_token() was small and simple enough so that it was certainly not worth the additional dependency. However even adding support for floats that start with '.' increased it's complexity. Next is negative numbers, urls and maybe more. Does anyone know whether there are still compatibility/integration issues with flex on exotic platforms? Cheers, Johannes

Johannes.Hofmann＠gmx.de

10:02 p.m.

New subject: css parsing numbers

On Mon, Mar 02, 2009 at 01:06:52PM -0300, Jorge Arellano Cid wrote:

...

On Sun, Mar 01, 2009 at 12:31:18PM +0100, Hofmann Johannes wrote:

...
On Sat, Feb 28, 2009 at 12:22:32AM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 07:06:27PM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 03:29:55PM +0000, corvid wrote:

...
...
On Fri, Feb 27, 2009 at 02:26:03PM +0000, corvid wrote: > > Committed. I hoped to be able to make that code a bit shorter, but > > could not find a reasonable solution... > > Yeah. In any case, it's just going to get worse/rearranged again > for negative numbers...

Exactly. I started my simplification attempts, added negative number support - and blew it all up :) I'm really considering a flex based scanner now. What do you think?

I only touched flex very briefly for a class at school long ago, and I don't remember anything from the experience, but the idea interests me since it has to be less trouble than trying to do it by hand...

This turned out to be easier than expected. Attached patch adds a flex based scanner for CSS data. It's not optimized or polished but seems to work here.

To make reviewing easier I created a repo for the flex experiment: http://freehg.org/u/dillo/flex/ I think the use of flex makes the code simpler and more maintainable. For some reason rendering is a bit different, e.g. on wikipedia.org maybe because negative numbers are supported. It would be nice if people could test it regarding the performance impact.

What do you think. Is it worth to add the flex dependency?

I'm a bit worried about the flex dependency.

It's being a long time since I saw/used flex/bison/yacc & friends. AFAIR, there was a myriad of slightly incompatible flavours.

It may be safer to generate a C-source parser with the tool, and to include it as a source file.

I'm currently checking re2c (http://re2c.org/) as a flex alternative and it looks pretty cool. It does not need a lib and produces pretty clean C code. So we might ship the generated code in the release tarballs. I'll post a prototype when I'm ready. Cheers, Johannes

jcid＠dillo.org

12:33 p.m.

New subject: css parsing numbers

On Mon, Mar 02, 2009 at 09:54:43PM +0100, Hofmann Johannes wrote:

...

On Mon, Mar 02, 2009 at 01:06:52PM -0300, Jorge Arellano Cid wrote:

...
On Sun, Mar 01, 2009 at 12:31:18PM +0100, Hofmann Johannes wrote:

...
On Sat, Feb 28, 2009 at 12:22:32AM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 07:06:27PM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 03:29:55PM +0000, corvid wrote:

...
> On Fri, Feb 27, 2009 at 02:26:03PM +0000, corvid wrote: > > > Committed. I hoped to be able to make that code a bit shorter, but > > > could not find a reasonable solution... > > > > Yeah. In any case, it's just going to get worse/rearranged again > > for negative numbers... > > Exactly. I started my simplification attempts, added negative number > support - and blew it all up :) > I'm really considering a flex based scanner now. What do you think?

I only touched flex very briefly for a class at school long ago, and I don't remember anything from the experience, but the idea interests me since it has to be less trouble than trying to do it by hand...

This turned out to be easier than expected. Attached patch adds a flex based scanner for CSS data. It's not optimized or polished but seems to work here.

To make reviewing easier I created a repo for the flex experiment: http://freehg.org/u/dillo/flex/ I think the use of flex makes the code simpler and more maintainable. For some reason rendering is a bit different, e.g. on wikipedia.org maybe because negative numbers are supported. It would be nice if people could test it regarding the performance impact.

What do you think. Is it worth to add the flex dependency?

I'm a bit worried about the flex dependency.

It's being a long time since I saw/used flex/bison/yacc & friends. AFAIR, there was a myriad of slightly incompatible flavours.

It may be safer to generate a C-source parser with the tool, and to include it as a source file.

I'm currently checking re2c (http://re2c.org/) as a flex alternative and it looks pretty cool. It does not need a lib and produces pretty clean C code. So we might ship the generated code in the release tarballs.

I'll post a prototype when I'm ready.

Great. C code looks like the more portable way to go. BTW, quoting from the description of Flex ('aptitude show flex'): <q> [...] The behaviour of Flex has undergone a major change since version 2.5.4a. Flex scanners are now reentrant, and it is now possible to have multiple scanners in the same program with differing sets of defaults, and the scanners play nicer with modern C and C++ compilers. The Flip side is that Flex no longer conforms to the POSIX lex behaviour, and the scanners require conforming implementations when flex is used in ANSI C mode. The package flex-old provides the older behaviour. </q> OTOH, re2c looks like a suitable tool we may use in other areas too. Go ahead! -- Cheers Jorge.-

Johannes.Hofmann＠gmx.de

2:13 p.m.

New subject: css parsing numbers

On Tue, Mar 03, 2009 at 08:34:09AM -0300, Jorge Arellano Cid wrote:

...

On Mon, Mar 02, 2009 at 09:54:43PM +0100, Hofmann Johannes wrote:

...
On Mon, Mar 02, 2009 at 01:06:52PM -0300, Jorge Arellano Cid wrote:

...
On Sun, Mar 01, 2009 at 12:31:18PM +0100, Hofmann Johannes wrote:

...
On Sat, Feb 28, 2009 at 12:22:32AM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 07:06:27PM +0100, Hofmann Johannes wrote:

...
On Fri, Feb 27, 2009 at 03:29:55PM +0000, corvid wrote: > > On Fri, Feb 27, 2009 at 02:26:03PM +0000, corvid wrote: > > > > Committed. I hoped to be able to make that code a bit shorter, but > > > > could not find a reasonable solution... > > > > > > Yeah. In any case, it's just going to get worse/rearranged again > > > for negative numbers... > > > > Exactly. I started my simplification attempts, added negative number > > support - and blew it all up :) > > I'm really considering a flex based scanner now. What do you think? > > I only touched flex very briefly for a class at school long ago, > and I don't remember anything from the experience, but the > idea interests me since it has to be less trouble than trying > to do it by hand... >

This turned out to be easier than expected. Attached patch adds a flex based scanner for CSS data. It's not optimized or polished but seems to work here.

To make reviewing easier I created a repo for the flex experiment: http://freehg.org/u/dillo/flex/ I think the use of flex makes the code simpler and more maintainable. For some reason rendering is a bit different, e.g. on wikipedia.org maybe because negative numbers are supported. It would be nice if people could test it regarding the performance impact.

What do you think. Is it worth to add the flex dependency?

I'm a bit worried about the flex dependency.

It's being a long time since I saw/used flex/bison/yacc & friends. AFAIR, there was a myriad of slightly incompatible flavours.

It may be safer to generate a C-source parser with the tool, and to include it as a source file.

I'm currently checking re2c (http://re2c.org/) as a flex alternative and it looks pretty cool. It does not need a lib and produces pretty clean C code. So we might ship the generated code in the release tarballs.

I'll post a prototype when I'm ready.

Great. C code looks like the more portable way to go.

BTW, quoting from the description of Flex ('aptitude show flex'):

<q> [...] The behaviour of Flex has undergone a major change since version 2.5.4a. Flex scanners are now reentrant, and it is now possible to have multiple scanners in the same program with differing sets of defaults, and the scanners play nicer with modern C and C++ compilers. The Flip side is that Flex no longer conforms to the POSIX lex behaviour, and the scanners require conforming implementations when flex is used in ANSI C mode. The package flex-old provides the older behaviour. </q>

OTOH, re2c looks like a suitable tool we may use in other areas too. Go ahead!

The re2c version now more or less works (escaped characters are not yet supported). You can find the code at http://freehg.org/u/dillo/flex/ Cheers, Johannes

6137

Age (days ago)

6141

Last active (days ago)

List overview

Download

7 comments

2 participants

participants (2)

jcid＠dillo.org
Johannes.Hofmann＠gmx.de