Parsing literals

ako · 14 December 2005 02:27

hello,

i need to write a function that would parse a string literal in another
language. a string literal in this language is:

STRING = "CHAR*"
CHAR = any character except for " and \
  > \"
  > \\
  > \/
  > \u four hexadecimal digits

the \u sequence specifies a character in UTF-16 encoding.

for example: "abc", "", "a\"bc", "a\\b", "a\u12bfc"

below is the code that i wrote. is this Ruby enough? can someone
suggest improvements? a better style?

thanks
konstantin

def parselit(s)
r = %r{\\"|\\/|\\\\|\\u[\da-f][\da-f][\da-f][\da-f]}i
s =~ /^"((?:[^"\\]|#{r})*)"$/ && $1.gsub(r) { |x| x =~ /\\u(.*)/ ?
[$1.hex].pack('U*') : x[1..-1] }
end

puts parselit('"\u004e\"a"')

W_James · 14 December 2005 11:12

ako... wrote:

hello,

i need to write a function that would parse a string literal in another
language. a string literal in this language is:

STRING = "CHAR*"
CHAR = any character except for " and \
  > \"
  > \\
  > \/
  > \u four hexadecimal digits

the \u sequence specifies a character in UTF-16 encoding.

for example: "abc", "", "a\"bc", "a\\b", "a\u12bfc"

below is the code that i wrote. is this Ruby enough? can someone
suggest improvements? a better style?

thanks
konstantin

def parselit(s)
  r = %r{\\"|\\/|\\\\|\\u[\da-f][\da-f][\da-f][\da-f]}i
  s =~ /^"((?:[^"\\]|#{r})*)"$/ && $1.gsub(r) { |x| x =~ /\\u(.*)/ ?
[$1.hex].pack('U*') : x[1..-1] }
end

puts parselit('"\u004e\"a"')

def parselit(s)

  re = %r{
           \\"
        > \\/
        > \\\\
        > \\u [\da-f] {4}
  }xoi

return nil if s !~ /^".*"$/

out = ""

  s[1..-2].scan( /\G (?: ( [^"\\]+ ) | ( #{re} ) )/x ){ |x|
    out <<
      if !x.last
        x.first
      else
        if x.last[0,2] == '\u'
          [x.last[2..-1].hex].pack('U*')
        else
          x.last[1..-1]
        end
      end

}

  # Fail if whole string didn't match.
  if $~.post_match != ""
    nil
  else
    out
  end

end

puts parselit('"\u004e\"a"')
puts parselit('"\u004e\""a"')

Topic		Replies	Views
Escape sequences for unicode chars? ruby-talk	4	172	30 June 2005
Convert \uXXXX to character ruby-talk	6	108	28 June 2010
Characters and strings oddness ruby-talk	3	95	15 June 2007
Regexp widehex glitch ruby-talk	0	66	19 May 2004
Unescaping hex encoded characters in string? ruby-talk	2	147	2 September 2010

Parsing literals

Related topics