On Fri, Nov 23, 2012 at 01:05:39PM +0100, Sebastian Geerken wrote:
Hi Jorge,
On Wed, Nov 21, Jorge Arellano Cid wrote:
On Sun, Nov 18, 2012 at 02:29:32PM +0100, Sebastian Geerken wrote:
[...] I wonder how breaking a single word in a line can be penalized with these controls.
See below ...
For instance [1], with both main dillo and dillo_hyphen the word "hyphenation" is broken twice:
hy- phen- ation
With the new controls, it could become:
hyphen- ation
but, in this particular case it should have been:
hyphenation
That there is a difference between dillo and dillo_hyphen is probably just accidental.
Ack.
Calculating width extremes, and so table rendering, is independent of the actual penalty values (except for the value "inf"i; see below).
I beg to differ.
[...] Penaltie values cannot actually influence table rendering:
1. Column widths depend on column min/max widths and available width (window width at the top).
Agreed.
2. Column min/max widths depend on cell min/max widths (maximum of the respective values).
Agreed.
3. Cell widths (calculated in dw::Textblock::getExtremesImpl) do not take the penalty values into account, only three cases are distinguished: inf (no break allowed), -inf (break forced), and other values (break possible.)
Agreed.
So changing penalties (say, to 5 instead of 1), won't make a difference.
Here I differ. Please let me know what I'm missing. If a penalty can increase the badness of breaking certain word, to the point of it not being broken (ar at least be broken less times), e.g. becomes or hy- hyphenation hyphen- phen- ation ation and that word is the only word in a cell, and that cell is the widest in its column (as in the URL example [1]), then cell max is affected (as you explain in point 2), an so table rendering.
[...] or even simpler, in characters:
penalty_one_word_line=18 /* Don't try to break words shorter than 18 chars, when alone in a single line */
The advantage I see to a penalty that handles this case is that it can help a lot with web rendering and also with more precise book rendering with a simple dillorc option.
Could be, but it sounds a bit hackish, and with an uncertain result. Could be worth a test, however.
It sounds a bit strange, but also the problem is: if we have to format text into a narrow column, breaking a long word to make it fit in, is clearly the way to go. Now, if breaking that word also makes the column narrower for the whole text (as narrow as the broken word's widest split part), then we have to choose which of both cases is better. The above dillorc setting is just a very simple (in the sense of: no need to calculate) heuristic to try to answer the question.
These are just ideas, not meant to be *the* solution. They have relatively simple implementations that could be field tested.
I've thought on another approach, which unfortunately turned out to be more complicated than I first thought: calculating width extremes without considering hyphenation. In this case, "hyphenation" is still hyphenated, but the minimal width of this cell is based on the whole word "hyphenation". This would have the following results:
1. table rendering is done the same way as by browsers not supporting hyphenation, and, 2. OTOH, hyphenation still works as desired.
As I said, I stumbled about a difficult detail; I'd like to give it another though.
Yes, it would solve the problem, but leaving out the advantage when the answer to the above stated question, favors an hyphenated solution. I'd like to have the best choice rendered! ;-) -- Cheers Jorge.-