Sorry about that. The facility is fully and extensively documented
with rdoc, and I’ve got an extensive test suite in the file itself
(just run the file), but I didn’t provide any real examples. I’ll
happily fix that now (both here and the webpage that I maintain for
Text::Format at my website). As a side note, I’m attempting to get a
SourceForge project going, Ruby-Perls, for other conversions like
this. I’m not sure if it will happen or not. I’ll also be releasing
Text::Format 0.52.1, as I’ve found a couple of minor issues in my
implementation.
My test suite and the attached sample program both use the
Gettysburg Address (as the constant GETTYSBURG), which can be found
at http://www.law.ou.edu/hist/getty.html, among other places. (There
are other editions, but the people who produce them improperly claim
copyright for the transcription or other restrictions which simply
aren’t permitted.)
Text::Format basically provides the ability to format paragraphs
using format instructions specified by a Text::Format object. When
you simply use Text::Format.new, you get the following default
values for the format object.
Standard options
columns : 72
tabstop : 8
first_indent : 4
body_indent : 0
format_style : Text::Format::LEFT_ALIGN
left_margin : 0
right_margin : 0
text : []
Advanced options
tag_paragraph: false
tag_text : []
tag_cur : ''
extra_space : false
abbreviations: []
nobreak : false
nobreak_regex: {}
If you specify a text for the format object, that’s the default text
to be operated on.
fo = Text::Format.new
puts fo.format(GETTYSBURG)
is functionally equivalent to
fo = Text::Format.new(GETTYSBURG)
puts fo.format
and I’m going to use the latter for the examples purely for
simplicity’s sake – I’m always working from the same text).
fo = Text::Format.new(GETTYSBURG)
puts fo.format
fo.format_style = Text::Format::JUSTIFY
puts fo.format
fo.columns = 40
fo.format_style = Text::Format::RIGHT_ALIGN
puts fo.format
Original:
Four score and seven years ago our fathers brought forth on this
continent a new nation, conceived in liberty and dedicated to the
Default:
Four score and seven years ago our fathers brought forth on this
continent a new nation, conceived in liberty and dedicated to the
Justified:
Four score and seven years ago our fathers brought forth on
this
continent a new nation, conceived in liberty and dedicated to
the
40-column, Right-aligned:
Four score and seven years ago our
fathers brought forth on this continent
a new nation, conceived in liberty and
You can also set an indent for both the first line and the body (to
get the original back, we would use columns 72, first_indent 4,
body_indent 4). It’s also possible to set a margin (space which will
not be used on either side). One would do what’s known as a hanging
indent by doing a body_indent larger than first_indent. (NOTE! This
is DIFFERENT than what the Perl package improperly calls a hanging
indent and I call a tagged paragraph. I’ll cover that momentarily)
The only difference between Text::Format#format and
Text::Format#paragraphs is that #format will only format the first
item in an array, whereas #paragraphs is more or less expecting an
array (it will array-ify a non-array argument, e.g., [foo].flatten),
where each item in an array is a separate paragraph. The only other
subtlety for #paragraphs that I can think of (and it is documented)
is that if the first line and body indentation values are the same,
a blank line will be inserted between the paragraphs.
Text::Format#center also expects an array – and will center each
“paragraph” as if it were one long line. If you pass one paragraph,
you actually want to split it along newlines so that you’re passing
only one line per item of the array. Like so:
puts fo.center(GETTYSBURG.split("\n"))
I could make a modification to where if __center is not passed an
Array, it attempts to do that, but I’m not sure if that’s a good
idea or not.
Tagged paragraphs are paragraphs that have a line of text inserted
before each paragraph. The line of text is drawn from the #tag_text
array, and is applied successively for each paragraph, with no value
being used if there are more paragraphs than tags. I’m not going to
demonstrate this, because I frankly don’t get the point – it is
tested in the test suite, though.
One can create abbreviations for a format object. In my sample text,
I’ve abbreviated “President” as “Pres.”. I would reflect this as:
fo.abbreviations << “Pres”
Standard, default abbreviations are:
‘Mr’, ‘Mrs’, ‘Ms’, ‘Jr’, ‘Sr’
In most cases, the abbreviations list doesn’t matter. However, some
people prefer to format text so that sentence full stops (periods,
‘.’) are followed by two spaces. This is NOT permitted for
abbreviations, so it is necessary to add any abbreviations in the
text or expected in the text to prevent “Pres. Abraham Lincoln” from
looking like “Pres. Abraham Lincoln”. The two-space functionality
is turned on in a format object by setting #extra_space to true.
Finally, we come to one of the neater – but harder – subjects,
which is non-breaking words. This can be turned on by setting
#nobreak to true – but it only has an effect if you set up the
#nobreak_regex hash. Gábor was quite clever when he wrote this
portion for the Perl version, and I like it a lot. Obviously, it’s
regular expression-based. If I have a hash of the following:
{ '^Mrs?\.$' => '\S+$', '^\S+$' => '^(?:S|J)r\.$'}
Then “Mr. Jones”, “Mrs. Jones”, and “Jones Jr.” would not be broken
at line endings. If this simple matching algorithm indicates that
there should not be a break at the current end of line, then a
backtrack is done until there are two words on which line breaking
is permitted. If two such words are not found, then the end of the
line will be broken regardless. If there is a single word on the
current line, then no backtrack is done and the word is stuck on the
end.
It’s quite a powerful library. It’s more useful in a fixed-pitch
text environment, but I imagine that it would be possible to adapt
the algorithm to an environment where all characters aren’t the same
width.
-austin
– Austin Ziegler, austin@halostatue.ca on 2002.10.20 at 12.29.20
···
On Sun, 20 Oct 2002 18:39:02 +0900, Massimiliano Mirra wrote:
On Sat, Oct 19, 2002 at 04:31:16AM +0900, Austin Ziegler wrote:
If you are familiar with Text::Format for Perl, you should find
this very familiar to use.
Could you please spend a couple of words for us non-perlers? The
name sounds interesting.