Extensions for hyphenation

sgeerken＠dillo.org

Nov. 18, 2012

1:29 p.m.

Hi! At http://flpsed.org/hgweb/dillo_hyphen, you'll find some extensions for hyphenation I've not yet merged into the main repository. Still needs some documentation, but here is an overview: There are now configuration variables for dillorc (see source): penalties for hyphens, as well as the left and right side of an em-dash. The suffix "_2" means that this value is used for lines following a line which ends already with a hyphen. When this value is larger, two adjacent lines ending with a hyphen are avoided. For values, see the definition of the "badness". Typical values: 0 = Penalty used for normal spaces. 1 = A justified line with spaces having 150% or 67% of the ideal space width has this as badness. 8 = A justified line with spaces twice as wide as ideally has this as badness. "inf" may be used (preventing a break in any case); also "-inf" (forcing a break), although the latter makes no sense and may lead to strange results. There is a text page, test/hyphens-etc.html, to play around. Sebastian

Show replies by date

jcid＠dillo.org

November 2012

2:36 p.m.

Hi Sebastian, On Sun, Nov 18, 2012 at 02:29:32PM +0100, Sebastian Geerken wrote:

...

Hi!

At http://flpsed.org/hgweb/dillo_hyphen, you'll find some extensions for hyphenation I've not yet merged into the main repository. Still needs some documentation, but here is an overview:

There are now configuration variables for dillorc (see source): penalties for hyphens, as well as the left and right side of an em-dash. The suffix "_2" means that this value is used for lines following a line which ends already with a hyphen. When this value is larger, two adjacent lines ending with a hyphen are avoided.

For values, see the definition of the "badness". Typical values:

0 = Penalty used for normal spaces. 1 = A justified line with spaces having 150% or 67% of the ideal space width has this as badness. 8 = A justified line with spaces twice as wide as ideally has this as badness.

"inf" may be used (preventing a break in any case); also "-inf" (forcing a break), although the latter makes no sense and may lead to strange results.

There is a text page, test/hyphens-etc.html, to play around.

I wonder how breaking a single word in a line can be penalized with these controls. For instance [1], with both main dillo and dillo_hyphen the word "hyphenation" is broken twice: hy- phen- ation With the new controls, it could become: hyphen- ation but, in this particular case it should have been: hyphenation In the same page, there're several cases of the same problem (one line above, a 1 row x 8 col table) where words are broken into a maximum of 4 times! In the web case, it's common to use the longest word in a line as minimal width. OTOH there's also the problem of too long "word" strings. In [1] clearly the browser tries to optimize for a minimal page width. Which is not the case, but that could perfectly have been as an external constraint to the algoritm (by means of screen size, TABLE element directives, floats, etc). So it is non trivial. I've worked enough on table rendering to know that making a decision based on the current textblock's min/max width would introduce too much complexity. e.g. in [1], just imagine the problem of deciding which words to break and where for a dynamic optimum of the table width. :-P A much simpler approach would be to introduce a penalty for breaking single words in a line, above a certain threshold that could be relative to the browser window's width. For instance: penalty_one_word_line=5 /* Penalty = (word_length > 1/4 window width) ? 0 : 5 */ or even simpler, in characters: penalty_one_word_line=18 /* Don't try to break words shorter than 18 chars, when alone in a single line */ The advantage I see to a penalty that handles this case is that it can help a lot with web rendering and also with more precise book rendering with a simple dillorc option. These are just ideas, not meant to be *the* solution. They have relatively simple implementations that could be field tested. HTH. [1] http://www.thefreedictionary.com/hyphenation -- Cheers Jorge.-

sgeerken＠dillo.org

12:05 p.m.

Hi Jorge, On Wed, Nov 21, Jorge Arellano Cid wrote:

...

On Sun, Nov 18, 2012 at 02:29:32PM +0100, Sebastian Geerken wrote:

...
At http://flpsed.org/hgweb/dillo_hyphen, you'll find some extensions for hyphenation I've not yet merged into the main repository. Still needs some documentation, but here is an overview:

There are now configuration variables for dillorc (see source): penalties for hyphens, as well as the left and right side of an em-dash. The suffix "_2" means that this value is used for lines following a line which ends already with a hyphen. When this value is larger, two adjacent lines ending with a hyphen are avoided.

For values, see the definition of the "badness". Typical values:

0 = Penalty used for normal spaces. 1 = A justified line with spaces having 150% or 67% of the ideal space width has this as badness. 8 = A justified line with spaces twice as wide as ideally has this as badness.

"inf" may be used (preventing a break in any case); also "-inf" (forcing a break), although the latter makes no sense and may lead to strange results.

There is a text page, test/hyphens-etc.html, to play around.

I wonder how breaking a single word in a line can be penalized with these controls.

See below ...

...

For instance [1], with both main dillo and dillo_hyphen the word "hyphenation" is broken twice:

hy- phen- ation

With the new controls, it could become:

hyphen- ation

but, in this particular case it should have been:

hyphenation

That there is a difference between dillo and dillo_hyphen is probably just accidental. Calculating width extremes, and so table rendering, is independent of the actual penalty values (except for the value "inf"i; see below).

...

In the same page, there're several cases of the same problem (one line above, a 1 row x 8 col table) where words are broken into a maximum of 4 times!

In the web case, it's common to use the longest word in a line as minimal width. OTOH there's also the problem of too long "word" strings.

I remember your other post some months ago.

...

In [1] clearly the browser tries to optimize for a minimal page width. Which is not the case, but that could perfectly have been as an external constraint to the algoritm (by means of screen size, TABLE element directives, floats, etc). So it is non trivial.

I've worked enough on table rendering to know that making a decision based on the current textblock's min/max width would introduce too much complexity. e.g. in [1], just imagine the problem of deciding which words to break and where for a dynamic optimum of the table width. :-P

A much simpler approach would be to introduce a penalty for breaking single words in a line, above a certain threshold that could be relative to the browser window's width.

For instance:

penalty_one_word_line=5 /* Penalty = (word_length > 1/4 window width) ? 0 : 5 */

Penaltie values cannot actually influence table rendering: 1. Column widths depend on column min/max widths and available width (window width at the top). 2. Column min/max widths depend on cell min/max widths (maximum of the respective values). 3. Cell widths (calculated in dw::Textblock::getExtremesImpl) do not take the penalty values into account, only three cases are distinguished: inf (no break allowed), -inf (break forced), and other values (break possible.) So changing penalties (say, to 5 instead of 1), won't make a difference.

...

or even simpler, in characters:

penalty_one_word_line=18 /* Don't try to break words shorter than 18 chars, when alone in a single line */

The advantage I see to a penalty that handles this case is that it can help a lot with web rendering and also with more precise book rendering with a simple dillorc option.

Could be, but it sounds a bit hackish, and with an uncertain result. Could be worth a test, however.

...

These are just ideas, not meant to be *the* solution. They have relatively simple implementations that could be field tested.

I've thought on another approach, which unfortunately turned out to be more complicated than I first thought: calculating width extremes without considering hyphenation. In this case, "hyphenation" is still hyphenated, but the minimal width of this cell is based on the whole word "hyphenation". This would have the following results: 1. table rendering is done the same way as by browsers not supporting hyphenation, and, 2. OTOH, hyphenation still works as desired. As I said, I stumbled about a difficult detail; I'd like to give it another though. Sebastian

jcid＠dillo.org

12:32 p.m.

On Fri, Nov 23, 2012 at 01:05:39PM +0100, Sebastian Geerken wrote:

...

Hi Jorge,

On Wed, Nov 21, Jorge Arellano Cid wrote:

...
On Sun, Nov 18, 2012 at 02:29:32PM +0100, Sebastian Geerken wrote:

...

...
[...] I wonder how breaking a single word in a line can be penalized with these controls.

See below ...

...
For instance [1], with both main dillo and dillo_hyphen the word "hyphenation" is broken twice:

hy- phen- ation

With the new controls, it could become:

hyphen- ation

but, in this particular case it should have been:

hyphenation

That there is a difference between dillo and dillo_hyphen is probably just accidental.

Ack.

...

Calculating width extremes, and so table rendering, is independent of the actual penalty values (except for the value "inf"i; see below).

I beg to differ.

...

[...] Penaltie values cannot actually influence table rendering:

1. Column widths depend on column min/max widths and available width (window width at the top).

Agreed.

...

2. Column min/max widths depend on cell min/max widths (maximum of the respective values).

Agreed.

...

3. Cell widths (calculated in dw::Textblock::getExtremesImpl) do not take the penalty values into account, only three cases are distinguished: inf (no break allowed), -inf (break forced), and other values (break possible.)

Agreed.

...

So changing penalties (say, to 5 instead of 1), won't make a difference.

Here I differ. Please let me know what I'm missing. If a penalty can increase the badness of breaking certain word, to the point of it not being broken (ar at least be broken less times), e.g. becomes or hy- hyphenation hyphen- phen- ation ation and that word is the only word in a cell, and that cell is the widest in its column (as in the URL example [1]), then cell max is affected (as you explain in point 2), an so table rendering.

...

...
[...] or even simpler, in characters:

penalty_one_word_line=18 /* Don't try to break words shorter than 18 chars, when alone in a single line */

The advantage I see to a penalty that handles this case is that it can help a lot with web rendering and also with more precise book rendering with a simple dillorc option.

Could be, but it sounds a bit hackish, and with an uncertain result. Could be worth a test, however.

It sounds a bit strange, but also the problem is: if we have to format text into a narrow column, breaking a long word to make it fit in, is clearly the way to go. Now, if breaking that word also makes the column narrower for the whole text (as narrow as the broken word's widest split part), then we have to choose which of both cases is better. The above dillorc setting is just a very simple (in the sense of: no need to calculate) heuristic to try to answer the question.

...

...
These are just ideas, not meant to be *the* solution. They have relatively simple implementations that could be field tested.

I've thought on another approach, which unfortunately turned out to be more complicated than I first thought: calculating width extremes without considering hyphenation. In this case, "hyphenation" is still hyphenated, but the minimal width of this cell is based on the whole word "hyphenation". This would have the following results:

1. table rendering is done the same way as by browsers not supporting hyphenation, and, 2. OTOH, hyphenation still works as desired.

As I said, I stumbled about a difficult detail; I'd like to give it another though.

Yes, it would solve the problem, but leaving out the advantage when the answer to the above stated question, favors an hyphenated solution. I'd like to have the best choice rendered! ;-) -- Cheers Jorge.-

sgeerken＠dillo.org

December 2012

10:59 a.m.

On Sat, Nov 24, Jorge Arellano Cid wrote:

...

...
Calculating width extremes, and so table rendering, is independent of the actual penalty values (except for the value "inf"i; see below).

I beg to differ.

...
[...] Penaltie values cannot actually influence table rendering:

1. Column widths depend on column min/max widths and available width (window width at the top).

Agreed.

...
2. Column min/max widths depend on cell min/max widths (maximum of the respective values).

Agreed.

...
3. Cell widths (calculated in dw::Textblock::getExtremesImpl) do not take the penalty values into account, only three cases are distinguished: inf (no break allowed), -inf (break forced), and other values (break possible.)

Agreed.

...
So changing penalties (say, to 5 instead of 1), won't make a difference.

Here I differ. Please let me know what I'm missing.

If a penalty can increase the badness of breaking certain word, to the point of it not being broken (ar at least be broken less times),

e.g. becomes or hy- hyphenation hyphen- phen- ation ation

and that word is the only word in a cell, and that cell is the widest in its column (as in the URL example [1]), then cell max is affected (as you explain in point 2), an so table rendering.

Penalties are useful for decisions regarding actual line breaking, but calculating the minimal width must regard all possible breaks; so there is only a decision whether a break is possible at all, or *not* possible at all. Both breaks, with penalties of 1 or 5, respectively, are possible breaks. So, as long as table calculation is kept this simple way (and not replaced by something more complex), the only possibility is to regard some breaks as not possible at all, when the minimal width is calculated.

...

...
...
[...] or even simpler, in characters:

penalty_one_word_line=18 /* Don't try to break words shorter than 18 chars, when alone in a single line */

The advantage I see to a penalty that handles this case is that it can help a lot with web rendering and also with more precise book rendering with a simple dillorc option.

Could be, but it sounds a bit hackish, and with an uncertain result. Could be worth a test, however.

It sounds a bit strange, but also the problem is: if we have to format text into a narrow column, breaking a long word to make it fit in, is clearly the way to go. Now, if breaking that word also makes the column narrower for the whole text (as narrow as the broken word's widest split part), then we have to choose which of both cases is better.

Generally, yes. But notice the order: 1. min/max width of the cell, and so min/max width of the column, 2. *actual* width of the column, 3. line breaking, based on the actual width. Any change must be made in #1, otherwise it is too late. So, the decision has to be simplified: regarding a given point as possible break or not. The actual column width (as you describe it) cannot be regarded.

...

The above dillorc setting is just a very simple (in the sense of: no need to calculate) heuristic to try to answer the question.

...
...
These are just ideas, not meant to be *the* solution. They have relatively simple implementations that could be field tested.

I've thought on another approach, which unfortunately turned out to be more complicated than I first thought: calculating width extremes without considering hyphenation. In this case, "hyphenation" is still hyphenated, but the minimal width of this cell is based on the whole word "hyphenation". This would have the following results:

1. table rendering is done the same way as by browsers not supporting hyphenation, and, 2. OTOH, hyphenation still works as desired.

As I said, I stumbled about a difficult detail; I'd like to give it another though.

Yes, it would solve the problem, but leaving out the advantage when the answer to the above stated question, favors an hyphenated solution.

I'd like to have the best choice rendered! ;-)

Independant of the approach actually implemented (perhaps both should be tested), the calculation of extremes has to be re-implemented, to decouple it from the line breaking. This is some more work, and rather risky (now half finished, but unstable). Is this something for the next release, or should we ignore it, on the short term? I can push my changes in another repository, if you want to take a look at it. Sebastian

jcid＠dillo.org

2:55 p.m.

On Wed, Dec 05, 2012 at 11:59:01AM +0100, Sebastian Geerken wrote:

...

On Sat, Nov 24, Jorge Arellano Cid wrote:

...
...
Calculating width extremes, and so table rendering, is independent of the actual penalty values (except for the value "inf"i; see below).

I beg to differ.

...
[...] Penaltie values cannot actually influence table rendering:

1. Column widths depend on column min/max widths and available width (window width at the top).

Agreed.

...
2. Column min/max widths depend on cell min/max widths (maximum of the respective values).

Agreed.

...
3. Cell widths (calculated in dw::Textblock::getExtremesImpl) do not take the penalty values into account, only three cases are distinguished: inf (no break allowed), -inf (break forced), and other values (break possible.)

Agreed.

...
So changing penalties (say, to 5 instead of 1), won't make a difference.

Here I differ. Please let me know what I'm missing.

If a penalty can increase the badness of breaking certain word, to the point of it not being broken (ar at least be broken less times),

e.g. becomes or hy- hyphenation hyphen- phen- ation ation

and that word is the only word in a cell, and that cell is the widest in its column (as in the URL example [1]), then cell max is affected (as you explain in point 2), an so table rendering.

Penalties are useful for decisions regarding actual line breaking, but calculating the minimal width must regard all possible breaks; so there is only a decision whether a break is possible at all, or *not* possible at all. Both breaks, with penalties of 1 or 5, respectively, are possible breaks.

Ack!

...

So, as long as table calculation is kept this simple way (and not replaced by something more complex), the only possibility is to regard some breaks as not possible at all, when the minimal width is calculated.

In that case, AFAIU, a dillorc option can be formulated as: Breaking a single word in a line is not possible unless it is wider than XX chars (e.g. single_word_max_length). (I assume knowing whether a word is alone in a line is possible by comparing the start and end indexes of the array, but haven't checked it...)

...

...
...
...
[...] or even simpler, in characters:

penalty_one_word_line=18 /* Don't try to break words shorter than 18 chars, when alone in a single line */

The advantage I see to a penalty that handles this case is that it can help a lot with web rendering and also with more precise book rendering with a simple dillorc option.

Could be, but it sounds a bit hackish, and with an uncertain result. Could be worth a test, however.

It sounds a bit strange, but also the problem is: if we have to format text into a narrow column, breaking a long word to make it fit in, is clearly the way to go. Now, if breaking that word also makes the column narrower for the whole text (as narrow as the broken word's widest split part), then we have to choose which of both cases is better.

Generally, yes. But notice the order: 1. min/max width of the cell, and so min/max width of the column, 2. *actual* width of the column, 3. line breaking, based on the actual width. Any change must be made in #1, otherwise it is too late.

So, the decision has to be simplified: regarding a given point as possible break or not. The actual column width (as you describe it) cannot be regarded.

I understand the point. But can it be decided beforehand (i.e. without knowing the actual width of the column), as stated above? (e.g. declaring via dillorc option that columns narrower than 18 characters are "bad" when a single word in a line needs hyphenation.).

...

...
[...] I'd like to have the best choice rendered! ;-)

Independant of the approach actually implemented (perhaps both should be tested), the calculation of extremes has to be re-implemented, to decouple it from the line breaking. This is some more work, and rather risky (now half finished, but unstable). Is this something for the next release, or should we ignore it, on the short term?

I don't know the complexity of this task, it certainly sounds daunting to me. I'd *like* to have full hyphenation in our next release, but also priorize floating objects over it. Certainly you're in a better position to evaluate the effort, side effects of it, and also decide.

...

I can push my changes in another repository, if you want to take a look at it.

If we were working in the same physical place, perhaps I could be of more help. My feeling is I would be more of a burden to explain the whole problem to by email than of help. If you feel otherwise, we may try. -- Cheers Jorge.-

sgeerken＠dillo.org

8:38 p.m.

Hi! Sorry for cutting this a bit off, but I'd like to suggest the following steps: 1. Concentrate on the next release, with hyphenation support. Which issues are open? I recall these: (a) How to deal with the pattern files, and (b) tables. Did I forget something? Are there some pages not rendering which were rendering before? For dealing with tables, see below, step 2 (after step 3). 3. Floats: will not be part of the next release, but I'd like to finish this rather soon; this has been delayed already long enugh. (There has been some progress, BTW, see <http://flpsed.org/hgweb/dillo_floats_geerken>.) 2. How to deal with tables? a) I've pushed my changes regarding extremes calculation to <http://flpsed.org/hgweb/dillo_hyphen>, but I'd like to push it into the main repository as soon as possible. The current state has some bugs, but I hope it should be stable soon. This is, BTW, generally some cleanup and so useful independent of this special purpose. (Might be even somewhat faster than the old implementation.) b) I'd like to use it for my approach: not using hyphenation for calculation the extremes, but of course, using hyphenation when breaking lines (also in table cells). This gives us: (i) full advantages of hyphenation; (ii) no risk regarding table rendering: table columns will be the same as before, and similar to other browsers (without hyphenation), so the problems Jorge mentioned will not occure; (iii) finally, this would make one bug obsolete: | **Incorrect calculation of extremes:** The minimal width of a text | block (as part of the width extremes, which are mainly used for | tables) is defined by everything between two possible breaks. A | possible break may also be a hyphenation point; however, hyphenation | points are calculated in a lazy way, when the lines are broken, and | not when extremes are calculated. So, it is a matter of chance whether | the calculation of the minimal width will take the two parts "dil-" | and "lo" into account (when "dillo" has already been hyphenated), or | only one part, "dillo" (when "dillo" has not yet been hyphenated), | resulting possibly in a different value for the minimal width. (See "dw-line-breaking.doc" for more. This would leave only "low priority" issues.) After the release, other approaches could be discussed. Comments and thoughts? Regards Sebastian

jcid＠dillo.org

3:49 p.m.

On Thu, Dec 06, 2012 at 09:38:28PM +0100, Sebastian Geerken wrote:

...

Hi!

Sorry for cutting this a bit off, but I'd like to suggest the following steps:

1. Concentrate on the next release, with hyphenation support. Which issues are open? I recall these: (a) How to deal with the pattern files, and (b) tables. Did I forget something? Are there some pages not rendering which were rendering before?

Not that I'm aware of (besides the table issue).

...

For dealing with tables, see below, step 2 (after step 3).

3. Floats: will not be part of the next release, but I'd like to finish this rather soon; this has been delayed already long enugh. (There has been some progress, BTW, see <http://flpsed.org/hgweb/dillo_floats_geerken>.)

Great.

...

2. How to deal with tables?

a) I've pushed my changes regarding extremes calculation to <http://flpsed.org/hgweb/dillo_hyphen>, but I'd like to push it into the main repository as soon as possible. The current state has some bugs, but I hope it should be stable soon.

This is, BTW, generally some cleanup and so useful independent of this special purpose. (Might be even somewhat faster than the old implementation.)

Probably this is the way to go. Just push it into the main repo as soon as you find it stable enough.

...

b) I'd like to use it for my approach: not using hyphenation for calculation the extremes, but of course, using hyphenation when breaking lines (also in table cells). This gives us:

(i) full advantages of hyphenation;

(ii) no risk regarding table rendering: table columns will be the same as before, and similar to other browsers (without hyphenation), so the problems Jorge mentioned will not occure;

(iii) finally, this would make one bug obsolete:

| **Incorrect calculation of extremes:** The minimal width of a text | block (as part of the width extremes, which are mainly used for | tables) is defined by everything between two possible breaks. A | possible break may also be a hyphenation point; however, hyphenation | points are calculated in a lazy way, when the lines are broken, and | not when extremes are calculated. So, it is a matter of chance whether | the calculation of the minimal width will take the two parts "dil-" | and "lo" into account (when "dillo" has already been hyphenated), or | only one part, "dillo" (when "dillo" has not yet been hyphenated), | resulting possibly in a different value for the minimal width.

(See "dw-line-breaking.doc" for more. This would leave only "low priority" issues.)

After the release, other approaches could be discussed.

This looks like the simplest approach. -- Cheers Jorge.-

sgeerken＠dillo.org

6:48 p.m.

On Fri, Dec 07, Jorge Arellano Cid wrote:

...

On Thu, Dec 06, 2012 at 09:38:28PM +0100, Sebastian Geerken wrote:

...
2. How to deal with tables?

a) I've pushed my changes regarding extremes calculation to <http://flpsed.org/hgweb/dillo_hyphen>, but I'd like to push it into the main repository as soon as possible. The current state has some bugs, but I hope it should be stable soon.

This is, BTW, generally some cleanup and so useful independent of this special purpose. (Might be even somewhat faster than the old implementation.)

Probably this is the way to go. Just push it into the main repo as soon as you find it stable enough.

Just pushed to http://hg.dillo.org/dillo. I've found no problems so far, but please take another look. Something which came to my mind: should this new rule, to exclude hyphenation breaks from minimal width calculation, also be applied to soft hyphens? Currently it is applied to both, but letting minimal width calculation consider soft hyphens as possible breaks may be feasable, since soft hyphens are set by the author himself (who, OTOH, is surprised by the fact that there is a browser also hyphenating automatically ;-) ). This would be rather simple: diff -r 6a525e279c5d dw/textblock.cc --- a/dw/textblock.cc Fri Dec 07 19:32:40 2012 +0100 +++ b/dw/textblock.cc Fri Dec 07 19:45:52 2012 +0100 @@ -41,7 +41,7 @@ Textblock::DivChar Textblock::divChars[NUM_DIV_CHARS] = { // soft hyphen (U+00AD) - { "\xc2\xad", true, false, true, PENALTY_HYPHEN, -1 }, + { "\xc2\xad", true, false, false, PENALTY_HYPHEN, -1 }, // simple hyphen-minus: same penalties like automatic or soft // hyphens { "-", false, true, true, -1, PENALTY_HYPHEN }, // (unconditional) hyphen (U+2010): handled exactly like // minus-hyphen. See test/table-h1.html for the effect. Sebastian

corvid＠lavabit.com

8:07 p.m.

Sebastian wrote:

...

Just pushed to http://hg.dillo.org/dillo. I've found no problems so far, but please take another look.

Is everything about hyphenation supposed to work at this point? For instance, dillo -geometry 850x700 http://fltk.org makes the names of the snapshots very narrow, and if I try penalty_hyphen=inf part of the fltk-2 snapshot link is obscured by the image.

corvid＠lavabit.com

12:38 a.m.

Sebastian wrote:

...

Just pushed to http://hg.dillo.org/dillo. I've found no problems so far, but please take another look.

Text files are giving me segfaults... Program received signal SIGSEGV, Segmentation fault. 0x0809f0b8 in dw::Textblock::handleWordExtremes (this=0x819fdd8, wordIndex=38684) at textblock_linebreaking.cc:676 676 par->maxParMin = prevPar->maxParMin;

sgeerken＠dillo.org

9:26 p.m.

On Fri, Dec 07, corvid wrote:

...

Sebastian wrote:

...
Just pushed to http://hg.dillo.org/dillo. I've found no problems so far, but please take another look.

Is everything about hyphenation supposed to work at this point?

It was supposed to work.

...

For instance,

dillo -geometry 850x700 http://fltk.org

makes the names of the snapshots very narrow, and if I try

penalty_hyphen=inf

part of the fltk-2 snapshot link is obscured by the image.

On Sat, Dec 08, corvid wrote:

...

Sebastian wrote:

...
Just pushed to http://hg.dillo.org/dillo. I've found no problems so far, but please take another look.

Text files are giving me segfaults...

Program received signal SIGSEGV, Segmentation fault. 0x0809f0b8 in dw::Textblock::handleWordExtremes (this=0x819fdd8, wordIndex=38684) at textblock_linebreaking.cc:676 676 par->maxParMin = prevPar->maxParMin;

I was not able to reproduce any of these bugs, but I found another bug, by which <http://www.dillo.org/CSS.html> was rendered with much too wide columns. Since this fixed a memory problems, chances are high that a couple of other problems are fixed, so please test again. Sebastian

corvid＠lavabit.com

10:55 p.m.

Sebastian wrote:

...

On Fri, Dec 07, corvid wrote:

...
For instance,

dillo -geometry 850x700 http://fltk.org

makes the names of the snapshots very narrow, and if I try

penalty_hyphen=inf

part of the fltk-2 snapshot link is obscured by the image.

I still see this, or at least the penalty_hyphen=inf case. I didn't try the default penalty_hyphen case.

...

On Sat, Dec 08, corvid wrote:

...
Sebastian wrote:

...
Just pushed to http://hg.dillo.org/dillo. I've found no problems so far, but please take another look.

Text files are giving me segfaults...

Program received signal SIGSEGV, Segmentation fault. 0x0809f0b8 in dw::Textblock::handleWordExtremes (this=0x819fdd8, wordIndex=38684) at textblock_linebreaking.cc:676 676 par->maxParMin = prevPar->maxParMin;

I was not able to reproduce any of these bugs, but I found another bug, by which <http://www.dillo.org/CSS.html> was rendered with much too wide columns.

Since this fixed a memory problems, chances are high that a couple of other problems are fixed, so please test again.

Not getting segfaults on the files that broke yesterday, but I note that I'm still getting messages from valgrind like: ==24632== Conditional jump or move depends on uninitialised value(s) ==24632== at 0x8115600: fl_utf8fwd (fl_utf.c:228) ==24632== by 0x80A275E: dw::fltk::FltkPlatform::nextGlyph(char const*, int) (fltkplatform.cc:594) ==24632== by 0x8099B1C: dw::core::Layout::nextGlyph(char const*, int) (layout.hh:316) ==24632== by 0x809836C: dw::Textblock::addText(char const*, unsigned int, dw::core::style::Style*) (textblock.cc:1469) ==24632== by 0x8071123: DilloPlain::addLine(char*, unsigned int) (plain.cc:146) ==24632== by 0x8071246: DilloPlain::write(void*, unsigned int, int) (plain.cc:179) ==24632== by 0x8071400: Plain_callback(int, _CacheClient*) (plain.cc:229) ==24632== by 0x8064F6A: Cache_process_queue (cache.c:1214) ==24632== by 0x80645A9: a_Cache_process_dbuf (cache.c:899) ==24632== by 0x80682F9: a_Capi_ccc (capi.c:742) ==24632== by 0x805F83C: a_Chain_fcb (chain.c:114) ==24632== by 0x808BC25: Dpi_parse_token (dpi.c:220)

sgeerken＠dillo.org

6:53 p.m.

On Sat, Dec 08, corvid wrote:

...

Sebastian wrote:

...
On Fri, Dec 07, corvid wrote:

...
For instance,

dillo -geometry 850x700 http://fltk.org

makes the names of the snapshots very narrow, and if I try

penalty_hyphen=inf

part of the fltk-2 snapshot link is obscured by the image.

I still see this, or at least the penalty_hyphen=inf case. I didn't try the default penalty_hyphen case.

I was finally able to reproduce this, by using "trickle". It is fixed now, at least for me; please test it again. As far as I see, this is the last bug related to hypenation, at least the last one relevant for the next release. Sebastian

corvid＠lavabit.com

7:43 p.m.

Sebastian wrote:

...

On Sat, Dec 08, corvid wrote:

...
Sebastian wrote:

...
On Fri, Dec 07, corvid wrote:

...
For instance,

dillo -geometry 850x700 http://fltk.org

makes the names of the snapshots very narrow, and if I try

penalty_hyphen=inf

part of the fltk-2 snapshot link is obscured by the image.

I still see this, or at least the penalty_hyphen=inf case. I didn't try the default penalty_hyphen case.

I was finally able to reproduce this, by using "trickle".

It is fixed now, at least for me; please test it again.

It's working :)

corvid＠lavabit.com

2:22 a.m.

The text spills over into the following cell in <table border=1> <tr> <td>W d<whatever>aaaaaaaaaaaa*<br> <td>WW d<whatever>aaaaaaaaaaaa*<br> <td>WWW d<whatever>aaaaaaaaaaaa*<br> <td>WWWW d<whatever>aaaaaaaaaaaa*<br> <td>WWWWW d<whatever>aaaaaaaaaaaa*<br> <td>WWWWWW d<whatever>aaaaaaaaaaaa*<br> </table> if I make the browser window narrowish.

sgeerken＠dillo.org

2:04 p.m.

On Thu, Dec 20, corvid wrote:

...

The text spills over into the following cell in

<table border=1> <tr> <td>W d<whatever>aaaaaaaaaaaa*<br> <td>WW d<whatever>aaaaaaaaaaaa*<br> <td>WWW d<whatever>aaaaaaaaaaaa*<br> <td>WWWW d<whatever>aaaaaaaaaaaa*<br> <td>WWWWW d<whatever>aaaaaaaaaaaa*<br> <td>WWWWWW d<whatever>aaaaaaaaaaaa*<br>

</table>

if I make the browser window narrowish.

Fixed. Sebastian

sgeerken＠dillo.org

9:51 a.m.

Just one comment: On Thu, Dec 06, Jorge Arellano Cid wrote:

...

In that case, AFAIU, a dillorc option can be formulated as:

Breaking a single word in a line is not possible unless it is wider than XX chars (e.g. single_word_max_length).

(I assume knowing whether a word is alone in a line is possible by comparing the start and end indexes of the array, but haven't checked it...)

Yet again, the problem is to calculate the correct extremes, to avoid too narrow columns. However, at this point, we do not have lines: these are calculated as soon as a given column width is calculated (which depends on the correct extremes). Sebastian

jcid＠dillo.org

4:05 p.m.

On Fri, Dec 07, 2012 at 10:51:03AM +0100, Sebastian Geerken wrote:

...

Just one comment:

On Thu, Dec 06, Jorge Arellano Cid wrote:

...
In that case, AFAIU, a dillorc option can be formulated as:

Breaking a single word in a line is not possible unless it is wider than XX chars (e.g. single_word_max_length).

(I assume knowing whether a word is alone in a line is possible by comparing the start and end indexes of the array, but haven't checked it...)

Yet again, the problem is to calculate the correct extremes, to avoid too narrow columns. However, at this point, we do not have lines: these are calculated as soon as a given column width is calculated (which depends on the correct extremes).

Ahhh!, now I see what you mean.

...

After the release, other approaches could be discussed.

If that's the case, then we only have a set of words to work with; It may be possible to devise a different heuristic. -- Cheers Jorge.-

Johannes.Hofmann＠gmx.de

November 2012

7:42 p.m.

Hi Sebastian, On Sun, Nov 18, 2012 at 02:29:32PM +0100, Sebastian Geerken wrote:

...

Hi!

At http://flpsed.org/hgweb/dillo_hyphen, you'll find some extensions for hyphenation I've not yet merged into the main repository. Still needs some documentation, but here is an overview:

There are now configuration variables for dillorc (see source): penalties for hyphens, as well as the left and right side of an em-dash. The suffix "_2" means that this value is used for lines following a line which ends already with a hyphen. When this value is larger, two adjacent lines ending with a hyphen are avoided.

For values, see the definition of the "badness". Typical values:

0 = Penalty used for normal spaces. 1 = A justified line with spaces having 150% or 67% of the ideal space width has this as badness. 8 = A justified line with spaces twice as wide as ideally has this as badness.

"inf" may be used (preventing a break in any case); also "-inf" (forcing a break), although the latter makes no sense and may lead to strange results.

There is a text page, test/hyphens-etc.html, to play around.

This looks great! To me it doesn't look as if it would involve any risky stuff. What do you think about merging dillo_hyphen into main line? Or should we wait until after the release? The table column width issue you are discussing with Jorge also exists in main line, right? So we can improve on that there. Johannes

sgeerken＠dillo.org

10:05 p.m.

On Sun, Nov 25, Johannes Hofmann wrote:

...

This looks great! To me it doesn't look as if it would involve any risky stuff. What do you think about merging dillo_hyphen into main line? Or should we wait until after the release?

I'd agree here. Changes are simple and should be robust. If noone complains, I'll merge tomorrow.

...

The table column width issue you are discussing with Jorge also exists in main line, right? So we can improve on that there.

Yes, there is nothing new here. Sebastian

sgeerken＠dillo.org

12:36 p.m.

New subject: Extensions for hyphenation -- Merged!

On Mon, Nov 26, Sebastian Geerken wrote:

...

On Sun, Nov 25, Johannes Hofmann wrote:

...
This looks great! To me it doesn't look as if it would involve any risky stuff. What do you think about merging dillo_hyphen into main line? Or should we wait until after the release?

I'd agree here. Changes are simple and should be robust. If noone complains, I'll merge tomorrow.

...
The table column width issue you are discussing with Jorge also exists in main line, right? So we can improve on that there.

Yes, there is nothing new here.

Since noone has complained, these changes are now in http://hg.dillo.org/dillo. Sebastian

4587

Age (days ago)

4622

Last active (days ago)

List overview

Download

21 comments

4 participants

participants (4)

corvid＠lavabit.com
jcid＠dillo.org
Johannes.Hofmann＠gmx.de
sgeerken＠dillo.org