On Wed, Dec 05, 2012 at 11:59:01AM +0100, Sebastian Geerken wrote:
On Sat, Nov 24, Jorge Arellano Cid wrote:
Calculating width extremes, and so table rendering, is independent of the actual penalty values (except for the value "inf"i; see below).
I beg to differ.
[...] Penaltie values cannot actually influence table rendering:
1. Column widths depend on column min/max widths and available width (window width at the top).
Agreed.
2. Column min/max widths depend on cell min/max widths (maximum of the respective values).
Agreed.
3. Cell widths (calculated in dw::Textblock::getExtremesImpl) do not take the penalty values into account, only three cases are distinguished: inf (no break allowed), -inf (break forced), and other values (break possible.)
Agreed.
So changing penalties (say, to 5 instead of 1), won't make a difference.
Here I differ. Please let me know what I'm missing.
If a penalty can increase the badness of breaking certain word, to the point of it not being broken (ar at least be broken less times),
e.g. becomes or hy- hyphenation hyphen- phen- ation ation
and that word is the only word in a cell, and that cell is the widest in its column (as in the URL example [1]), then cell max is affected (as you explain in point 2), an so table rendering.
Penalties are useful for decisions regarding actual line breaking, but calculating the minimal width must regard all possible breaks; so there is only a decision whether a break is possible at all, or *not* possible at all. Both breaks, with penalties of 1 or 5, respectively, are possible breaks.
Ack!
So, as long as table calculation is kept this simple way (and not replaced by something more complex), the only possibility is to regard some breaks as not possible at all, when the minimal width is calculated.
In that case, AFAIU, a dillorc option can be formulated as: Breaking a single word in a line is not possible unless it is wider than XX chars (e.g. single_word_max_length). (I assume knowing whether a word is alone in a line is possible by comparing the start and end indexes of the array, but haven't checked it...)
[...] or even simpler, in characters:
penalty_one_word_line=18 /* Don't try to break words shorter than 18 chars, when alone in a single line */
The advantage I see to a penalty that handles this case is that it can help a lot with web rendering and also with more precise book rendering with a simple dillorc option.
Could be, but it sounds a bit hackish, and with an uncertain result. Could be worth a test, however.
It sounds a bit strange, but also the problem is: if we have to format text into a narrow column, breaking a long word to make it fit in, is clearly the way to go. Now, if breaking that word also makes the column narrower for the whole text (as narrow as the broken word's widest split part), then we have to choose which of both cases is better.
Generally, yes. But notice the order: 1. min/max width of the cell, and so min/max width of the column, 2. *actual* width of the column, 3. line breaking, based on the actual width. Any change must be made in #1, otherwise it is too late.
So, the decision has to be simplified: regarding a given point as possible break or not. The actual column width (as you describe it) cannot be regarded.
I understand the point. But can it be decided beforehand (i.e. without knowing the actual width of the column), as stated above? (e.g. declaring via dillorc option that columns narrower than 18 characters are "bad" when a single word in a line needs hyphenation.).
[...] I'd like to have the best choice rendered! ;-)
Independant of the approach actually implemented (perhaps both should be tested), the calculation of extremes has to be re-implemented, to decouple it from the line breaking. This is some more work, and rather risky (now half finished, but unstable). Is this something for the next release, or should we ignore it, on the short term?
I don't know the complexity of this task, it certainly sounds daunting to me. I'd *like* to have full hyphenation in our next release, but also priorize floating objects over it. Certainly you're in a better position to evaluate the effort, side effects of it, and also decide.
I can push my changes in another repository, if you want to take a look at it.
If we were working in the same physical place, perhaps I could be of more help. My feeling is I would be more of a burden to explain the whole problem to by email than of help. If you feel otherwise, we may try. -- Cheers Jorge.-