I've been trying to programmatically issue MSN Searches and processing
the results. I'm having a hell of a time doing it and was wondering
whether anyone had some coding or debugging advice.
First, I tried wsdl2ruby to generate some classes to work with, but it
pukes:
C:\temp>wsdl2ruby.rb --wsdl
http://soap.search.msn.com/webservices.asmx?wsdl --type client --force
ignored element: {http://www.w3.org/2001/XMLSchema}list
ignored attr: {}default
ignored attr: {http://schemas.xmlsoap.org/ws/2004/08/addressing}Action
I, [2006-10-10T16:36:52.259000 #2608] INFO -- app: Creating class
definition.
W, [2006-10-10T16:36:52.259000 #2608] WARN -- app: File 'default.rb'
exists but overrides it.
F, [2006-10-10T16:36:52.275000 #2608] FATAL -- app: Detected an
exception. Stopping ... incomplete simpleType (ArgumentError)
C:/program files/ruby/lib/ruby/1.8/wsdl/xmlSchema/simpleType.rb:33:in
`base'
C:/program files/ruby/lib/ruby/1.8/wsdl/soap/classDefCreator.rb:217:in
[snip]
(BTW, wsdl2ruby works with http://api.google.com/GoogleSearch.wsdl.)
Second, I tried this code:
require 'soap/wsdlDriver'
wsdl_url = 'http://soap.search.msn.com/webservices.asmx?wsdl'
soap = SOAP::WSDLDriverFactory.new( wsdl_url ).create_rpc_driver
msn_params = { 'AppID' => '1064081Cxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
'Query' => 'ruby programming language',
'CultureInfo' => 'en-US',
'SafeSearch' => 'Strict',
'Flags' => 'None',
'Requests' => {
'SourceRequest' => {
'Source' => 'Web',
'Offset' => 0,
'Count' => 10,
'ResultFields' => 'All'
}
}
}
soap.search(:Request => msn_params)
And got this:
irb(main):020:0* soap.search(:Request => msn_params)
ArgumentError: incomplete simpleType from c:/Program F
iles/ruby/lib/ruby/1.8/wsdl/xmlSchema/simpleType.rb:25:in
`check_lexical_fo
rmat'
from c:/Program
Files/ruby/lib/ruby/1.8/soap/mapping/wsdlliteralregistry.rb:113:in
`simpleob
j2soap'
[snip]
Note: the Python equivalent of this code works just fine, so I think it
has something to do with the way Ruby is processing SOAP.
Third, I tried to do it without SOAP:
require 'rubygems'
require 'open-uri'
require 'rubyful_soup'
url =
"http://search.live.com/results.aspx?q=ruby+programming+language&mkt=en-
us&FORM=LVSP&go.x=0&go.y=0&go=Search"
page = open(url)
page_content = page.read
soup = BeautifulSoup.new(page_content)
and I get this:
irb(main):007:0> soup = BeautifulSoup.new(page_content)
ArgumentError: invalid value for Integer: "0183"
from c:/Program
Files/ruby/lib/ruby/gems/1.8/gems/htmltools-1.10/lib/html/sgml-parser.rb
:335
:in `Integer'
from c:/Program
Files/ruby/lib/ruby/gems/1.8/gems/htmltools-1.10/lib/html/sgml-parser.rb
:335
:in `handle_charref'
from c:/Program
Files/ruby/lib/ruby/gems/1.8/gems/htmltools-1.10/lib/html/sgml-parser.rb
:159
:in `goahead'
My next step is do to HTree/REXML, but I'd much rather use SOAP or
BeautifulSoup to do this. Anyone got ideas?