[Dillo-dev] Extensions for hyphenation

Dec. 5, 2012

      On Sat, Nov 24, Jorge Arellano Cid wrote:
...
...
Calculating width extremes, and so table rendering,
is independent of the actual penalty values (except for the value
"inf"i; see below).
I beg to differ.
...
[...]
Penaltie values cannot actually influence table rendering:
1. Column widths depend on column min/max widths and available width
   (window width at the top).
Agreed.
...
2. Column min/max widths depend on cell min/max widths (maximum of the
   respective values).
Agreed.
...
3. Cell widths (calculated in dw::Textblock::getExtremesImpl) do not
   take the penalty values into account, only three cases are
   distinguished: inf (no break allowed), -inf (break forced), and
   other values (break possible.)
Agreed.
...
So changing penalties (say, to 5 instead of 1), won't make a
difference.
Here I differ. Please let me know what I'm missing.
If a penalty can increase the badness of breaking certain word,
to the point of it not being broken (ar at least be broken less times),
e.g.
             becomes                     or
     hy-                  hyphenation             hyphen-
     phen-                                        ation
     ation
and that word is the only word in a cell, and that cell is the
widest in its column (as in the URL example [1]), then cell max
is affected (as you explain in point 2), an so table rendering.
Penalties are useful for decisions regarding actual line breaking, but
calculating the minimal width must regard all possible breaks; so
there is only a decision whether a break is possible at all, or *not*
possible at all. Both breaks, with penalties of 1 or 5, respectively,
are possible breaks.

So, as long as table calculation is kept this simple way (and not
replaced by something more complex), the only possibility is to regard
some breaks as not possible at all, when the minimal width is
calculated.
...
...
...
[...]
  or even simpler, in characters:
penalty_one_word_line=18
    /* Don't try to break words shorter than 18 chars, when
       alone in a single line */
The advantage I see to a penalty that handles this case is that
it  can  help a lot with web rendering and also with more precise
book rendering with a simple dillorc option.
Could be, but it sounds a bit hackish, and with an uncertain result.
Could be worth a test, however.
It sounds a bit strange, but also the problem is: if we have to
format text into a narrow column, breaking a long word to make it
fit in, is clearly the way to go. Now, if breaking that word also
makes  the  column  narrower for the whole text (as narrow as the
broken word's widest split part), then we have to choose which of
both cases is better.
Generally, yes. But notice the order: 1. min/max width of the cell,
and so min/max width of the column, 2. *actual* width of the column,
3. line breaking, based on the actual width. Any change must be made
in #1, otherwise it is too late.

So, the decision has to be simplified: regarding a given point as
possible break or not. The actual column width (as you describe it)
cannot be regarded.
...
The  above  dillorc setting is just a very simple (in the sense
of:  no  need  to  calculate)  heuristic  to  try  to  answer the
question.
...
...
These are just ideas, not meant to be *the* solution. They have
relatively simple implementations that could be field tested.
I've thought on another approach, which unfortunately turned out to be
more complicated than I first thought: calculating width extremes
without considering hyphenation. In this case, "hyphenation" is still
hyphenated, but the minimal width of this cell is based on the whole
word "hyphenation". This would have the following results:
1. table rendering is done the same way as by browsers not supporting
   hyphenation, and,
2. OTOH, hyphenation still works as desired.
As I said, I stumbled about a difficult detail; I'd like to give it
another though.
Yes,  it would solve the problem, but leaving out the advantage
when   the  answer  to  the  above  stated  question,  favors  an
hyphenated solution.
I'd like to have the best choice rendered! ;-)
Independant of the approach actually implemented (perhaps both should
be tested), the calculation of extremes has to be re-implemented, to
decouple it from the line breaking. This is some more work, and rather
risky (now half finished, but unstable). Is this something for the
next release, or should we ignore it, on the short term?

I can push my changes in another repository, if you want to take a
look at it.

Sebastian

[Dillo-dev] Extensions for hyphenation

sgeerken＠dillo.org