[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ruby-talk
Subject:    Re: Splitting a text file into sentences
From:       Jeffrey Schwab <jeff () schwabcenter ! com>
Date:       2005-11-30 23:42:30
Message-ID: PLqjf.325$TU6.11994 () twister ! southeast ! rr ! com
[Download RAW message or body]

Dave Howell wrote:
> I think "right" or "wrong" are a tad strong for most of the cases sited. 
> But as a professional book designer and typographer, there's 
> unquestionably "better" and "worse."
> 
> For improved legibility, inter-sentence space should generally be a bit 
> greater than inter-word space.
> 
> Typewriters only had one distance they could travel. Either 1/10th of an 
> inch ("Pica") or 1/12th ("Elite"). So the only way to add extra space 
> after a sentence was to double it. That's way too much extra space, but 
> it was generally better than the alternative. The real problem was that 
> the words were too far apart, not that the sentences were too close, but 
> again, the fixed spacing was already an abominable situation.
> 
> Proportional type, dating all the way back to Gutenberg, would generally 
> use 1/3rd or 1/4th of the height of type type as the inter-word spacing. 
> This would usually work out to about the width of a lower case "t" or "l".
> 
> When setting modern (by which you may also read "all type before 
> typewriters" as well) proportional type in fully justified form (left 
> and right margins both even), the spaces must be stretched out on a 
> line-by-line basis to fit. Really good typesetting programs (and really 
> good typesetters sticking little bits of lead between their words (and 
> I've done that, too)) will add more of the space between sentences than 
> between words, so as the line stretches, the inter-word space to 
> inter-sentence space ratio actually changes. (Take a look at a narrow 
> newspaper column sometime.)
> 
> More sophisticated approaches to space will ignore a user's attempt to 
> sprinkle extraneous space in. Less sophisticated ones might allow it, 
> and even treat them as individual spaces, stretching both of them during 
> expansion. {shudder}
> 
> The fact that both the MLA Guidelines and the Bedford Handbook encourage 
> poor typography is regrettable. ("If you cannot type appropriate 
> punctuation, e.g. an em-dash or en-dash, please use appropriate 
> substitutions. For both dashes, substitute a pair of hyphens, which, 
> like true dashes, are typed without adjacent spaces." There's still 
> software out there that will happily wrap a line between the two 
> hyphens. Ick!) Nevertheless, if you're submitting a paper to an 
> institution that expects or requires that, then to not follow them is 
> wrong, even if the legibility of the submission is better.
> 
> What it all boils down to is "Putting two spaces after a period at the 
> end of a sentence is an artifact left over from the days when the 
> typewriter was the prevalent text-making tool. Unless you have a 
> specific reason or requirement to do otherwise, it's preferable to put 
> only one space between sentences."
> 
> *****
> 
> For breaking text into sentences, sometimes I find it easier to work 
> backwards.  Also, only very colloquial writing will have  a one-word 
> sentence, so you can solve all "Mr./Dr./Ph.D." cases by the fact that if 
> a word starts with a cap and ends with a period, it's not a sentence. 
> For a more sophisticated approach that's still not too complex to 
> program, check the final word of a sentence against a dictionary. If 
> it's found there without a final dot, then you're almost certainly 
> looking at the end of a sentence. If it isn't, then is it found anywhere 
> else in the document without a dot? If not, then you're probably looking 
> at an abbreviation. (My mail program uses a monospaced font. If I 
> thought most readers would read it with a proportional font, I'd have 
> typed "Ph. D." above, since it should have a thin space before the D.)

This is what I love about Usenet. :)

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic