I’ve been thinking for a day or so about
signal vs. noise, in source code and
perhaps in other contexts (ahem!!).
This is just a crazy theory of mine, so
flame away if you like.
I personally find that the more "littered"
a program is with punctuation and symbols,
the less I like to look at it. (Yes, it’s
possible to have too little of that, but
that’s rare in programming languages.)
For example, the parentheses in a C "if"
statement annoy me. The terminating semicolon
in many languages is slightly annoying. The
frequent colons in Python bother me. And let’s
not even get into Perl.
As a very crude way of measuring this, I decided
to count alphanumerics vs. symbols in code
(counting whitespace as neither – an arbitrary
decision on my part).
I cranked out a quick bit of ugly code (see
below). Obviously it’s crude – e.g., it doesn’t
take note of strings or comments (and it’s not
clear what it should do if it did).
I’d be curious to see people’s results on a large
corpus of code (Ruby, Perl, etc.).
So far I haven’t tried it much, as I just wrote it
half an hour ago.
I have noticed an odd effect already, though. The
symbol/alpha ratio is fairly low (1-2) for smaller
programs and larger (4-6) for larger programs. I’ve
tried it on sources ranging from 10 lines to 2000
lines.
Cheers,
Hal
···
–
Hal Fulton
hal9000@hypermetrics.com
noise = 0
alpha = 0
punc = “’” + ',./`-=[];~!@#$%^&*()_+{}|:"<>?'
punc = punc.split "“
white = " \t\n”.split “”
$stdin.each_byte do |x|
case x.chr
when *punc
noise += 1
when *white
# ignore
else
alpha += 1
end
end
ratio = alpha/noise.to_f
puts “Signal/noise: #{alpha}/#{noise} = #{’%4.1f’ % ratio}”