···
“Ryosuke Kadoi” kadoism@hotmail.com wrote:
Also, if you are interested in bioinformatics from machine learning
standpoint with Ruby, please let me know any information.
I try to write Sarsa in Ruby. Follwing code is implementation
‘‘Example 6.5’’ of Section 6.4 in [1].
#! /usr/bin/env ruby
WIND = [ 0, 0, 0, 1, 1, 1, 2, 2, 1, 0 ]
N = 1000
EPSILON = 0.1
ALPHA = 0.1
GAMMA = 1.0
action-value function table
class Q
def (state, a)
@table[[state, a]]
end
def =(state, a, v)
@table[[state, a]] = v
end
def state_collect(state)
ary = @table.collect{|k, v| k[0] == state ? [k[1], v] : nil }
ary.compact!
ary
end
def initialize(v=0.0)
@table = {}
(0…6).each do |y|
(0…9).each do |x|
[ :up, :down, :right, :left ].each do |a|
@table[[[x, y], a]] = v
end
end
end
end
end
epsilon-greedy policy function
def epsilon_greedy(q, state, epsilon=EPSILON)
ary = q.state_collect(state)
if epsilon >= rand then
## search
ary[rand(ary.length)][0]
else
## greedy
ary.max{|a, b| a[1] <=> b[1]}[0]
end
end
environment
def do_action(state, action)
state = state.dup
case action
when :up
state[1] -= 1
when :down
state[1] += 1
when :left
state[0] -= 1
when :right
state[0] += 1
end
state[1] -= WIND[state[0]] if 0 <= state[0] and state[0] <= 9
state[0] = [[0, state[0]].max, 9].min
state[1] = [[0, state[1]].max, 6].min
return [state, -1]
end
q = Q.new
N.times do |n|
state = [0, 3]
step = 0
action = epsilon_greedy(q, state)
until state == [7, 3] do
next_state, reward = do_action(state, action)
next_action = epsilon_greedy(q, next_state)
q[state, action] += ALPHA*(reward + GAMMA*q[next_state, next_action] - q[state, action])
state = next_state
action = next_action
step += 1
end
puts “#{n} #{step}” # output the count of episode and time-step
STDOUT.flush
end
greedy (for evaluate action-value function)
step = 0
state = [0, 3]
action = epsilon_greedy(q, state, 0.0)
until state == [7, 3] do
next_state, reward = do_action(state, action)
next_action = epsilon_greedy(q, next_state, 0.0)
state = next_state
action = next_action
step += 1
end
puts “# greedy policy #{step}”
STDOUT.flush
If very well, I would like to move to ruby-list because I’m
not good at English. Maybe I’m good for you if use Japanese.
[1] Richard S. Sutton and Andrew G. Barto, Reinforcement
Learning: An Introduction, MIT Press, Cambridge, MA, 1998,
http://www-anw.cs.umass.edu/~rich/book/the-book.html.
–
1024D/2A3FDBE6 2001-08-26 Kenta MURATA (muraken) muraken2@nifty.com
Key fingerprint = 622A 61D3 280F 4991 4833 5724 8E2D C5E1 2A3F DBE6