UTF-8 and printf

Hello,

I'm trying to use printf to give a tabulated format to my output, like
for example:

printf "%20s %10s %10s", title, author, date

being title, author and date string variables. The text contained in
these variables is uft-8 encoded, and this makes printf to misalign
the outupt. The reason is that one multi-byte char (for example, a two-
byte char) is counted as several chars (two chars), and thus the
number of spaces required for padding is wrongly calculated.

I searched for discussions about ruby and utf8, and in general it does
not appear as an easy issue. I read abou the String#char proxy
introduced by rails, but I'm not using rails, and in addition I think
it would be of no help here.

Do you know any solution to my problem? The use of printf it is not a
requisite, all what I want if to align the output in columns, without
using "\t"

Thanks in advance,
--Jose

Unfortunately, I think you'll have to use something ugly like this...

def pad(n, s)
  (" " * (n - s.unpack("U*").length)) + s
end

def padded(*elems)
  out =
  for elem in elems
    out << pad(elem[0], elem[1])
  end
  out.join(" ")
end

puts padded([20, title], [10, author], [10, date])

Regards,
Jordan

···

On Dec 5, 4:19 am, Jose <jld...@gmail.com> wrote:

Hello,

I'm trying to use printf to give a tabulated format to my output, like
for example:

printf "%20s %10s %10s", title, author, date

being title, author and date string variables. The text contained in
these variables is uft-8 encoded, and this makes printf to misalign
the outupt. The reason is that one multi-byte char (for example, a two-
byte char) is counted as several chars (two chars), and thus the
number of spaces required for padding is wrongly calculated.

I searched for discussions about ruby and utf8, and in general it does
not appear as an easy issue. I read abou the String#char proxy
introduced by rails, but I'm not using rails, and in addition I think
it would be of no help here.

Do you know any solution to my problem? The use of printf it is not a
requisite, all what I want if to align the output in columns, without
using "\t"

Thanks in advance,
--Jose

Hey, thank you very much. The trick of unpack to find the string
length is a nice one. And s.unpack("U*").length is only 4 times slower
than s.length, according to my benchmarks.

Anybody knows this printf "bug" will be solved in ruby 1.9?

Regards,
--Jose

···

On 5 dic, 14:58, MonkeeSage <MonkeeS...@gmail.com> wrote:

Unfortunately, I think you'll have to use something ugly like this...

def pad(n, s)
  (" " * (n - s.unpack("U*").length)) + s
end
[...]

Not really a bug, just that 1.8 doesn't have native unicode support.
But, yes, in ruby 1.9 you have a native utf-8 type, so with default
utf-8 encoding, printf Just Works (you can also force utf-8 encoding
with String#force_encoding if you're using a different native
encoding, and printf does the right thing). :slight_smile:

Regards,
Jordan

···

On Dec 6, 6:54 pm, Jose <jld...@gmail.com> wrote:

On 5 dic, 14:58, MonkeeSage <MonkeeS...@gmail.com> wrote:

> Unfortunately, I think you'll have to use something ugly like this...

> def pad(n, s)
> (" " * (n - s.unpack("U*").length)) + s
> end
> [...]

Hey, thank you very much. The trick of unpack to find the string
length is a nice one. And s.unpack("U*").length is only 4 times slower
than s.length, according to my benchmarks.

Anybody knows this printf "bug" will be solved in ruby 1.9?

Regards,
--Jose

> Anybody knows this printf "bug" will be solved in ruby 1.9?

Not really a bug, just that 1.8 doesn't have native unicode support.

I understand. That's why I put "bug" in double quotes

But, yes, in ruby 1.9 you have a native utf-8 type, so with default
utf-8 encoding, printf Just Works

Great!

I'm still intrigued about the poor utf8 support in current and past
versions, specially taking into account that ruby was developed in
Japan. Anyway, these are good news.

Thanks for answering,
--Jose

···

On 7 dic, 02:06, MonkeeSage <MonkeeS...@gmail.com> wrote:

> > Anybody knows this printf "bug" will be solved in ruby 1.9?

> Not really a bug, just that 1.8 doesn't have native unicode support.

I understand. That's why I put "bug" in double quotes

> But, yes, in ruby 1.9 you have a native utf-8 type, so with default
> utf-8 encoding, printf Just Works

Great!

I'm still intrigued about the poor utf8 support in current and past
versions, specially taking into account that ruby was developed in
Japan. Anyway, these are good news.

IIRC, ruby wasn't created with unicode support because unicode is less
efficient at representing East Asian character sets than other
encodings like shift-jis/euc-jp (something to the effect of unicode
requiring 16-bits to store characters that can be represented in 8-
bits in those other encodings).

Thanks for answering,
--Jose

No problem. :slight_smile:

Regards,
Jordan

···

On Dec 6, 7:11 pm, Jose <jld...@gmail.com> wrote:

On 7 dic, 02:06, MonkeeSage <MonkeeS...@gmail.com> wrote: