Hi Jorge, On Wed, Nov 21, Jorge Arellano Cid wrote:
On Sun, Nov 18, 2012 at 02:29:32PM +0100, Sebastian Geerken wrote:
At http://flpsed.org/hgweb/dillo_hyphen, you'll find some extensions for hyphenation I've not yet merged into the main repository. Still needs some documentation, but here is an overview:
There are now configuration variables for dillorc (see source): penalties for hyphens, as well as the left and right side of an em-dash. The suffix "_2" means that this value is used for lines following a line which ends already with a hyphen. When this value is larger, two adjacent lines ending with a hyphen are avoided.
For values, see the definition of the "badness". Typical values:
0 = Penalty used for normal spaces. 1 = A justified line with spaces having 150% or 67% of the ideal space width has this as badness. 8 = A justified line with spaces twice as wide as ideally has this as badness.
"inf" may be used (preventing a break in any case); also "-inf" (forcing a break), although the latter makes no sense and may lead to strange results.
There is a text page, test/hyphens-etc.html, to play around.
I wonder how breaking a single word in a line can be penalized with these controls.
See below ...
For instance [1], with both main dillo and dillo_hyphen the word "hyphenation" is broken twice:
hy- phen- ation
With the new controls, it could become:
hyphen- ation
but, in this particular case it should have been:
hyphenation
That there is a difference between dillo and dillo_hyphen is probably just accidental. Calculating width extremes, and so table rendering, is independent of the actual penalty values (except for the value "inf"i; see below).
In the same page, there're several cases of the same problem (one line above, a 1 row x 8 col table) where words are broken into a maximum of 4 times!
In the web case, it's common to use the longest word in a line as minimal width. OTOH there's also the problem of too long "word" strings.
I remember your other post some months ago.
In [1] clearly the browser tries to optimize for a minimal page width. Which is not the case, but that could perfectly have been as an external constraint to the algoritm (by means of screen size, TABLE element directives, floats, etc). So it is non trivial.
I've worked enough on table rendering to know that making a decision based on the current textblock's min/max width would introduce too much complexity. e.g. in [1], just imagine the problem of deciding which words to break and where for a dynamic optimum of the table width. :-P
A much simpler approach would be to introduce a penalty for breaking single words in a line, above a certain threshold that could be relative to the browser window's width.
For instance:
penalty_one_word_line=5 /* Penalty = (word_length > 1/4 window width) ? 0 : 5 */
Penaltie values cannot actually influence table rendering: 1. Column widths depend on column min/max widths and available width (window width at the top). 2. Column min/max widths depend on cell min/max widths (maximum of the respective values). 3. Cell widths (calculated in dw::Textblock::getExtremesImpl) do not take the penalty values into account, only three cases are distinguished: inf (no break allowed), -inf (break forced), and other values (break possible.) So changing penalties (say, to 5 instead of 1), won't make a difference.
or even simpler, in characters:
penalty_one_word_line=18 /* Don't try to break words shorter than 18 chars, when alone in a single line */
The advantage I see to a penalty that handles this case is that it can help a lot with web rendering and also with more precise book rendering with a simple dillorc option.
Could be, but it sounds a bit hackish, and with an uncertain result. Could be worth a test, however.
These are just ideas, not meant to be *the* solution. They have relatively simple implementations that could be field tested.
I've thought on another approach, which unfortunately turned out to be more complicated than I first thought: calculating width extremes without considering hyphenation. In this case, "hyphenation" is still hyphenated, but the minimal width of this cell is based on the whole word "hyphenation". This would have the following results: 1. table rendering is done the same way as by browsers not supporting hyphenation, and, 2. OTOH, hyphenation still works as desired. As I said, I stumbled about a difficult detail; I'd like to give it another though. Sebastian