Attached is a 30 mins. hack of mine that does something like that.
The scanning part is really a kludge but I’ve been using it w/
acceptable results in a proxy that add hints to webpages on the fly.
IMHO it can be done in one pass.
···
On Thu, Jun 12, 2003 at 11:38:13AM +0900, Hal E. Fulton wrote:
Given http://www.rubygarden.org/ruby?ClassMethodsTutorial, my hack
returns
[“This”, “is”, “simply”, “an”, “extract”, “from”, “a”, “post”, “to”, “ruby-talk”, “by”, “DavidBlack”, “on”, “the”, “topic”, “of”, “class”, “methods.”]
[“It”, “is”, “stored”, “here”, “in”, “the”, “hope”, “that”, “it”, “will”, “be”, “useful!”]
[“It”, “actually”, “goes”, “beyond”, “the”, “surface”, “of”, “class”, “methods”, “to”, “describe”, “the”, “nature”, “of”, “classes”, “as”, “objects”, “so”, “is”, “interesting”, “reading”, “for”, “anyone”, “progressing”, “to”, “intermediate-level”, “Ruby.”]
[“See”, “also”, “ClassMethods”, “for”, “an”, “overview”, “of”, “the”, “options”, “available”, “in”, “Ruby”, “for”, “creating”, “class”, “methods”, “and”, “SingletonTutorial”, “for”, “a”, “detailed”, “explanation”, “of”, “singleton”, “methods.”]
[“Every”, “object”, “responds”, “to”, “certain”, “messages”, “i.e.”, “can”, “call”, “methods”, “with”, “certain”, “names”, “.”]
[“Usually”, “those”, “methods”, “are”, “the”, “instance”, “methods”, “defined”, “by”, “the”, “object’s”, “class”, “However”, “it’s”, “also”, “possible”, “to”, “add”, “methods”, “to”, “individual”, “objects”, “Now”, “c”, “will”, “respond”, “to”, “speak”, “–”, “but”, “other”, “instances”, “of”, “class”, “C”, “will”, “not”, “This”, “means”, “that”, “speak”, “is”, “a”, “singleton”, “method”, “of”, “c.”]
[“now”, “look”, “at”, “this”, “Notice”, “the”, “similarity”, “between”, “the”, “syntax”, “involved”, “in”, “creating”, “a”, “new”, “singleton”, “method”, “for”, “c”, “and”, “creating”, “a”, “class”, “method”, “of”, “class”, “D”, “In”, “fact”, “these”, “are”, “essentially”, “the”, “same”, “thing.”]
[“In”, “both”, “cases”, “what’s”, “happening”, “is”, “that”, “a”, “singleton”, “method”, “is”, “being”, “added”, “to”, “a”, “particular”, “object.”]
[“It”, “just”, “happens”, “to”, “be”, “that”, “in”, “the”, “second”, “case”, “the”, “object”, “getting”, “the”, “new”, “method”, “is”, “a”, “Class”, “object”, “as”, “opposed”, “to”, “a”, “String”, “an”, “Array”, “an”, “instance”, “of”, “MyClass”, “?”]
[“So”, “now”, “D”, “responds”, “to”, “greet”, “just”, “as”, “c”, “responds”, “to”, “speak”, “.”]
[“In”, “other”, “words”, “the”, “term”, “class”, “method”, “is”, “just”, “a”, “special”, “term”, “for”, “something”, “which”, “you”, “can”, “do”, “with”, “any”, “mutable”, “object”, “namely”, “add”, “a”, “singleton”, “method”, “to”, “it.”]
[“It”, “has”, “a”, “special”, “name”, “because”, “in”, “actual”, “program”, “design”, “class”, “methods”, “have”, “a”, “special”, “role”, “to”, “play.”]
[“But”, “what”, “they”, “are”, “at”, “heart”, “is”, “singleton”, “methods”, “defined”, “on”, “objects”, “where”, “those”, “objects”, “happen”, “to”, “be”, “instances”, “of”, “a”, “class”, “called”, “Class.”]
[“The”, “use”, “of”, “uppercase”, “names”, “constants”, “for”, “classes”, “can”, “obscure”, “the”, “fact”, “that”, “classes”, “are”, “just”, “objects.”]
[“Also”, “the”, “usual”, “style”, “is”, “to”, “put”, “class”, “method”, “definitions”, “inside”, “the”, “class”, “definition”, “which”, “makes”, “it”, “look”, “like”, “they”, “have”, “some”, “special”, “status.”]
[“But”, “look”, “at”, “this”, “etc.”]
[“You”, “can”, “see”, “that”, “some”, “of”, “the”, “special”, “treatment”, “of”, “classes”, “–”, “constants”, “as”, “names”, “the”, “separate”, “notion”, “of”, “class”, “method”, “for”, “their”, “singleton”, “methods”, “–”, “is”, “just”, “that”, “special”, “treatment.”]
[“Underneath”, “a”, “class”, “is”, “indeed”, “an”, “object.”]
[“CategoryDocumentation”, “CategoryTutorial”, “HomePage”, “RecentChanges”, “Preferences”, “RubyGarden”, “Edit”, “text”, “of”, “this”, “page”, “View”, “other”, “revisions”, “Last”, “edited”, “May”, “am”, “diff”, “Search”]
note that “i.e.” was recognized 
However several problems are yet to be solved:
- how to get rid of meaningful lone words? (last line)
- what happens to things like
As seen here:
CODE
bla bla bla.
- etc
However solving that would transform the 30mins. hack into a 1H kludge,
better stay this way 
–
_ _
__ __ | | ___ _ __ ___ __ _ _ __
'_ \ / | __/ __| '_ _ \ / ` | ’ \
) | (| | |__ \ | | | | | (| | | | |
.__/ _,|_|/| || ||_,|| |_|
Running Debian GNU/Linux Sid (unstable)
batsman dot geo at yahoo dot com
Because I don’t need to worry about finances I can ignore Microsoft
and take over the (computing) world from the grassroots.
– Linus Torvalds