A vision for Parrot

Benjamin Goldberg wrote:

Considering that Jcl and Jython exist, it seems like a reasonable goal

(JCL is something else. I’d rather not remember it thankyouverymuch.)

would be to make an interpreter which turns Java’s .class files into
Parrot .pasm files. Once that tool exists, one could simply translate
Jcl and Jython into parrot… there would be no need to re-implement
them.

And one day, in the distant future, there will be a Perl6 decompiler,
which will turn Parrot bytecode into Perl6. Then we’ll be able to
convert the translated Jython and Jcl into Perl6 :slight_smile:

$10 says that only ever happens with a performance hit. The problem is that not
all bytecodes are created equal. (And Jacl is an implementation of Tcl in Java
more in the way that the usual form of Tcl is an implementation of the language
in C. The fact that Java uses bytecodes is pretty much just a distraction
here. We also have another way of integrating Tcl with Java that keeps Tcl
implemented in C, but which integrates almost identically with the Java
language.)

This sort of thing tends to make me suspicious that this is little
more than a pipe-dream, well, at least as far as Tcl’s concerned. (I
don’t know the other languages nearly well enough to comment in a
useful way.)
[…]
Assuming you thouroughly understand Tcl’s bytecodes, why not take a look
at Parrot, and see whether the set of bytecodes that parrot supports is
sufficient to do everything that Tcl’s bytecodes do?

I know a bit about Tcl bytecodes, and a key factor about them is that they are
very tightly targetted towards implementing Tcl.

Hmm. A quick scan through the documentation doesn’t really raise my hopes. Or
even leave me with a deep enough understanding of what’s going on; is there any
deeper description than http://www.parrotcode.org/docs/parrot_assembly.pod.html
about? (OK, Tcl’s bytecodes need documentation too, but I’ve already gone to
the effort to understand those as part of my maintenance and development
duties. I’ve just not got enough hours in the day.) Unfortunately, the bits
that I’m most interested in seem to be the bits with least info (isn’t that
always the way with complex software systems?)

First impressions: what is meant by “string” anyway? Character sequence? Byte
sequence? UTF-8 sequence? ISO 8859-1 sequence? [FX: Reads docs] Oh, they
carry about what their encoding is with them? That must make working with them
fun. How does it handle things like the blecherous monstrosities[*] used for
system encodings in the Far East? On a quite separate point, is there a
strncmp() equivalent? That would make implementing Tcl much easier…

More generally, Tcl would need to use PMCs throughout. The problem is that
Tcl’s value semantics (copy-on-write) do not line up well with that which Parrot
seems to use (object-based) and which, IIRC from the discussions at the time
when Parrot was being created, are closely based on those used in Perl even if
not precisely coincident. Hence we’d be unable to use ground values except as
part of the implementation of higher-level concepts. That’ll impact badly on
performance.

It’s at this point that I feel a round of “Stuff it. I’ll stick to implementing
in C.” coming on. I’ve been quietly watching Parrot for a while now, and I
still don’t think that implementing Tcl in it is really a winning proposition.
I’d love someone to prove me wrong, but proof is building a Tcl interpreter in
or on top of Parrot and running the Tcl test suite on it (and getting a decent
proportion of the tests passing.)

BTW, how does Parrot handle calls to foreign code? The docs I’ve seen are on
the hazy side, and integration with existing C, C++ and FORTRAN monoliths is
(alas) all too important in particularly commercial development.

Donal.
[* The cluster of things known as “Shift-JIS” justifies this term. IMHO. ]

···


Donal K. Fellows http://www.cs.man.ac.uk/~fellowsd/ donal.fellows@man.ac.uk
– The small advantage of not having California being part of my country would
be overweighed by having California as a heavily-armed rabid weasel on our
borders. – David Parsons <o r c @ p e l l . p o r t l a n d . o r . u s>

Donal K. Fellows wrote:

<snip interesting discussion, which mostly goes over my head :>

BTW, how does Parrot handle calls to foreign code? The docs I’ve seen are on
the hazy side, and integration with existing C, C++ and FORTRAN monoliths is
(alas) all too important in particularly commercial development.

On this particular point; has anyone thought of writing a unified C-API for Tcl,
Perl, Python, Ruby, (Java) etc? What I mean by this is that each of these
languages can be extended in C, and quite often are. Also, whenever I see
interesting new C/C++ libraries implemented, I also tend to see seperate
language bindings for each of these languages. This seems like a massive
duplication of effort. Would it be possible to abstract over the C API’s of
these languages to a common core of functionality which all share (e.g. getting
an interpreter handle, registering a command, setting a variable etc)? I have
only dealt with Tcl’s C interface, so this may be too difficult, but it seems
like it would be a huge step forward.
In terms of problems implementing this, I can see several problems:

  1. Method of calling C-coded procedures. In Tcl, arguments are passed to C
    procedures as Tcl_Obj’s. Obviously this would be different for Perl or Python.
    Is it possible to come up with an API which can convert to the appropriate
    object type, or to some intermediate type?
  2. Accessing functions which are not present in all languages. I’m sure there
    are API’s in Tcl which only make sense with respect to Tcl, and probably the
    same in the other languages. So, how would one create a general API which
    allows you to call language specific APIs? In the same way one handles platform
    specific APIs?
  3. Versioning. Which language versions are compatible with which abstract API? I
    can see this one becoming insane over time. However, there are people with much
    more experience with these issues than me, so there might be a way.
    4… probably more, this is all coming of the top of my head.

Hmm… the more I think about this, the more problems it seems to present. I’d
love to be able to write an extension, and have it instantly work with x
different langauges. Also, I’d love to be able to use Python and Perl
extensions from Tcl, without loading seperate interpreters and all that. Am I
dreaming of an impossible Utopia?

Neil “soo naive” Madden

···


package r Tkhtml;package r http;pack [scrollbar .v -o v -co {.h yv}] -s right
-f y;pack [html .h -ys {.v set}] -f both -e 1;bind .h.x <1> {eval g [.h href %x
%y]};proc g u {set t [http::geturl $u];.h cl;.h p [http::data $t];http::cleanup
$t;.h co -base $u};g http://mini.net/tcl/976.html;proc bgerror args {};# NEM :slight_smile:

Donal K. Fellows wrote:

Benjamin Goldberg wrote:

Considering that Jcl and Jython exist, it seems like a reasonable
goal

(JCL is something else. I’d rather not remember it thankyouverymuch.)

Erm… that’s the old IBM Job Control Language? You mean this one?

Jargon - ESR

Bleh, forget I mentioned it. :slight_smile: Twas a horrible typo :slight_smile:

would be to make an interpreter which turns Java’s .class files into
Parrot .pasm files. Once that tool exists, one could simply
translate Jcl and Jython into parrot… there would be no need to
re-implement them.

And one day, in the distant future, there will be a Perl6
decompiler, which will turn Parrot bytecode into Perl6. Then we’ll
be able to convert the translated Jython and Jcl into Perl6 :slight_smile:

$10 says that only ever happens with a performance hit. The problem
is that not all bytecodes are created equal. (And Jacl is an
implementation of Tcl in Java more in the way that the usual form of
Tcl is an implementation of the language in C.

So Jacl still converts Tcl into, well, Tcl bytecodes, even though it’s
doing so in Java? Blech.

Hmm, is there a way of making tcl dump the tcl-bytecodes to a file?

If so, one could probably make an attempt to translate those bytecodes
into parrot. (And ignore Jacl).

The fact that Java uses bytecodes is pretty much just a distraction
here. We also have another way of integrating Tcl with Java that
keeps Tcl
implemented in C, but which integrates almost identically with the Java
language.)

This sort of thing tends to make me suspicious that this is little
more than a pipe-dream, well, at least as far as Tcl’s concerned.
(I don’t know the other languages nearly well enough to comment
in a useful way.)
[…]
Assuming you thouroughly understand Tcl’s bytecodes, why not take a
look at Parrot, and see whether the set of bytecodes that parrot
supports is sufficient to do everything that Tcl’s bytecodes do?

I know a bit about Tcl bytecodes, and a key factor about them is that
they are very tightly targetted towards implementing Tcl.

Hmm. A quick scan through the documentation doesn’t really raise my
hopes. Or even leave me with a deep enough understanding of what’s
going on; is there any deeper description than
http://www.parrotcode.org/docs/parrot_assembly.pod.html
about? (OK, Tcl’s bytecodes need documentation too, but I’ve already
gone to the effort to understand those as part of my maintenance and
development duties. I’ve just not got enough hours in the day.)
Unfortunately, the bits that I’m most interested in seem to be the
bits with least info (isn’t that always the way with complex software
systems?)

First impressions: what is meant by “string” anyway? Character
sequence? Byte sequence? UTF-8 sequence? ISO 8859-1 sequence? [FX:
Reads docs] Oh, they carry about what their encoding is with them?
That must make working with them fun. How does it handle things like
the blecherous monstrosities[*] used for system encodings in the Far
East?

Having read http://www.parrotcode.org/docs/strings.pod.html only just
now myself, it’s possible I could be wrong on this, but…

Each string’s encoding can be one of native, utf8, utf16, utf32, or
foreign. So those “blecherous monstrosities” will either be converted
to one of the utf formats, or else have their own string vtable.

For now, they will probably be converted… the strings.pod.html says
this at the bottom:
Foreign Encodings

Fill this in later; if anyone wants to implement
new encodings at this stage they must be mad."

On a quite separate point, is there a strncmp() equivalent? That
would make implementing Tcl much easier…

You mean, for testing the first n characters of two strings for
equality? There isn’t that I know of, but one could always be added;
furthermore, it supposedly will be possible to make lightweight strings
which are substrings of other strings, without any copying involved.
You could make your strncmp be a wrapper around making a substring of
the first n characters of each of your two strings, and comparing those
substrings.

More generally, Tcl would need to use PMCs throughout.

Why? (Not an objection, but I don’t know much about Tcl’s bytecode)

The problem is that Tcl’s value semantics (copy-on-write) do not line
up well with that which Parrot seems to use (object-based)

Parrot will do copy-on-write.

Furthermore, Parrot may implement some strings as ropes, so that the
amount that needs to be copied will be even smaller.

and which, IIRC from the discussions at the time when Parrot was being
created, are closely based on those used in Perl even if not precisely
coincident.

Perl is likely never going to implement strings as ropes. It does now
have copy-on-write, though this is a recent development.

Perl5.6+ has two internal encodings for strings – bytes and utf8.
Parrot not only allows native, utf8, utf16, and utf32, but it also
allows any kind of user-defined encoding one might want. I doubt that
perl5 will ever do this.

Hence we’d be unable to use ground values except as part of the
implementation of higher-level concepts. That’ll impact badly on
performance.

It’s at this point that I feel a round of “Stuff it. I’ll stick to
implementing in C.” coming on. I’ve been quietly watching Parrot for
a while now, and I still don’t think that implementing Tcl in it is
really a winning proposition.
I’d love someone to prove me wrong, but proof is building a Tcl
interpreter in or on top of Parrot and running the Tcl test suite on
it (and getting a decent proportion of the tests passing.)

Parrot does everything in two steps – compile, then run. Most likely,
it will have a compiler which converts Tcl bytecode to Parrot bytecode.

Whether or not Parrot will ever translate from Tcl source to Parrot
bytecode is another question entirely.

Thinking a bit more, particularly about how Tcl often needs to interpret
strings at runtime, I realize that no non-trivial Tcl program can work
without having a string-to-bytecode compiler. Needless to say, this
poses a problem.

BTW, how does Parrot handle calls to foreign code? The docs I’ve seen
are on the hazy side, and integration with existing C, C++ and FORTRAN
monoliths is (alas) all too important in particularly commercial
development.

Although I don’t know how it will handle foreign code, I do know that
it will handle foreign code, and have a better interface than Perl5’s
cruddy XS extension language.

···


my $n = 2; print +(split //, ‘e,4c3H r ktulrnsJ2tPaeh’
…“\n1oa! er”)[map $n = ($n * 24 + 30) % 31, (42) x 26]

Neil Madden nem00u@cs.nott.ac.uk writes:

Donal K. Fellows wrote:

<snip interesting discussion, which mostly goes over my head :>

BTW, how does Parrot handle calls to foreign code? The docs I’ve seen are on
the hazy side, and integration with existing C, C++ and FORTRAN monoliths is
(alas) all too important in particularly commercial development.

On this particular point; has anyone thought of writing a unified C-API for Tcl,
Perl, Python, Ruby, (Java) etc? What I mean by this is that each of these
languages can be extended in C, and quite often are. Also, whenever I see
interesting new C/C++ libraries implemented, I also tend to see seperate
language bindings for each of these languages. This seems like a massive
duplication of effort. Would it be possible to abstract over the C API’s of
these languages to a common core of functionality which all share (e.g. getting
an interpreter handle, registering a command, setting a variable etc)? I have
only dealt with Tcl’s C interface, so this may be too difficult, but it seems
like it would be a huge step forward.

Look at
http://www.swig.org/

Regards,
Slaven

···


Slaven Rezic - slaven.rezic@berlin.de
Tired of using file selectors? Real programmers use the TAB key for
completion and not for jumping around. Try
Search for "module:Tk::PathEntry" - metacpan.org

Sorry for the repost at perl, I didn’t release the post was crossposted (I
entered via Ruby).

“Neil Madden” nem00u@cs.nott.ac.uk wrote in message
news:LRaA9.1325$J55.292310@newsfep2-win.server.ntli.net

Hmm… the more I think about this, the more problems it seems to present.
I’d
love to be able to write an extension, and have it instantly work with x
different langauges. Also, I’d love to be able to use Python and Perl
extensions from Tcl, without loading seperate interpreters and all that.
Am I
dreaming of an impossible Utopia?

I wrote some DLL helper logic that makes it easy to query a vtable by name
in another DLL.
The vtables can be created in whatever way, but the vtable is not assocated
with allocated memory, as is the case with COM. This makes it quite flexible
without really being limited because one could decide that the first
function should return a ‘this’ pointer and the second function should be
the destructor, or whatever.

In any DLL that can link statically with a piece of C, you can use this
framework to pass functions around. (Even if you can only link dynamically,
you could have a helper dll to expose vtables). I use some #define macros to
create the vtables statically (see snippet below). It’s then very easy to
take arbitrary C functions and wrap them up in one or more vtables. However,
vtables need not be statically allocated (and this is significantly
different from public dll functions).
Because the vtables are looked up by name initially, you could handle
versioning like in COM: “mycomponent.vtablename.1”, where
“mycomponent.vtablename” would refer to the most recent version - but it is
preferable to always specificy an exact version.
Contrary to COM this is completely cross platform as there is no OS magic
involved.
When a vtable is created, it is helpful to write a C prototype struct that
matches, because this makes things easier clientside - but it is not
required (see snippet below).
On the clientside I wrote some small wrappers to ease loading dll’s
dynamically and in C++ to wrap the optional prototype. I only did this for
Windows, but the principles are essentially the same on Linux, just like
Ruby loads .so files.

The framework is generic - here is what I did to access functionality in a
OCaml parser application I wrapped up as selfcontained dll - without using
public dll functions except for init, query vtable and uninit functions: I
picked all the most relevant OCaml API logic for allocating memory on the
OCaml runtime stack and created a set of vtables approximately one for each
…h file in the API. The Ocaml OCaml functions I wrote (parse_file) I had to
dynamically call a function to locate the OCaml function address. In the dll
init logic I performed this operation and added the result to a vtable
already prepared for the purpse. The client of the dll loads the dll, calls
init and then queries relevant vtables, but in principle has no clue that it
really is OCaml the executes the logic (in principle because in praxis I
wanted to know in order to efficiently allocate memory).

The same thing could be done in Ruby, wrapping rb_… functions into vtables
and having separate vtables for calling Ruby code - this vtables would be
created dynamically or at least filled dynamically.

Any language that supports C integration and dynamic link libraries can use
this framework.

Incidentally Parrot works a lot with vtables, so there could be some overlap
here (I didn’t know that at the time I wrote the framework though).

I have not put the code online, but if anyone is interested, let my know.

Below is a readme snippet, a bit technical and not the only way to use the
framework (it would be possible to map Ruby functions into dynamically
created vtables for instance).

Mikkel

Interfaces are organized in vtable maps which are statically allocated arrays of VtblMapEntries.
VTBL_MAP_BEGIN(<vtbl_map_name>)
VTBL_MAP_ENTRY(name, vtbl)
...  more entries here ...
VTBL_MAP_END

The name <vtbl_map_name> is later used be the linker, such that the map

can be hooked into a master
map of all vtable interfaces compiled together.
The master map is located in a central compilation unit such that new
vtable maps can easily be
added to the master map without modifying the source of any of the
existing map providers.

Before entering the map into the mastermap, it must be declared - unless

the map is earlier in the same
compilation unit as the master map:

VTBL_DECLARE_MAP(<vtbl_map_name>)
... more maps declared here ...

Following the map declareations, there is a master map which is scanned

by the default lookup function:

VTBL_MASTER_MAP_BEGIN(<vtbl_master_map_name>)
VTBL_MASTER_MAP_ENTRY(<vtbl_map_name>)
... more master map entries here ...
VTBL_MASTER_MAP_END

Typically, a dynamically loaded library (dll) will have a purpose

specific mastermap using a selection
of available vtbl interface maps.

Example:
we have the following functions in a .c file. Moreover, we have the

malloc and free functions
from the std library. We want all four functions wrapped in two
interfaces: FooBar and Mem.

First we create a header file for the interfaces:
<file "examples.h">
struct
{
    int (*get)(int x);
    void (*set)(int x, int val);
    void *(*create)(int x);
} ExamplesVectorVtbl;
struct
{
    void *(*allocate)(size_t size);
    void (*deallocate)(void *p);
} ExamplesMemoryVtbl;
</file>
<file "examples.c">
#include <memory.h>
#include "vtbl.h"
#include "examples.h"
void set(void *p, int x, int val) { return ((int*)p)[x] = val; };
int get(void *p, int x) { return ((int*)p)[x]; }
void *create(int x) { return calloc(x * sizeof(int)); };
ExamplesVectorVtbl { get, set, create } vector_vtbl;
/* shows that existing library functions can be packaged as well */
ExamplesMemoryVtbl { calloc, free } mem_vtbl;

VTBL_MAP_BEGIN(examples_vtbl_map)
VTBL_MAP_ENTRY("Examples.Vector", vector_vtbl)
VTBL_MAP_ENTRY("Examples.Memory", mem_vtbl)
VTBL_MAP_END
</file>
<file "master.c">

/* to get vtbl.h and the lookup function in vtbl.c */
#include "vtbl.c"

VTBL_DECLARE_MAP(examples_vtbl_map)

VTBL_MASTER_MAP_BEGIN(vtbl_master_map)
VTBL_MASTER_MAP_ENTRY(examples_vtbl_map)
VTBL_MASTER_MAP_END

void *GetNamedInterface(char *name)
{
    return vtbl_master_map_lookup(vtbl_master_map, name);
}

</file>
<file "client.c">
#include "examples.h"
void *GetNamedInterface(char *name);
void test()
{
    /* since interfaces are static, pMem and pVec need not be

deallocated */
Examples_Memory pMem =
(Examples_Memory
)GetNamedInterface(“Examples.Memory”);
Examples_Vector pVec =
(Examples_Vector
)GetNamedInterface(“Examples.Vector”);
void *v1 = pMem->allocate(sizeof(int[4]));
void *v2 = pVec->create(4);
pVec->set(v1, 2, 42);
pVec->set(v2, 0, pVec->get(v1, 2));
pMem->deallocate(v1);
pMem->deallocate(v2);
}

Typically the client would have loaded a dynamically linked library and

found the address of the published
GetNamedInterface function. Once that function is avaible, it is easy to
access all the remaining functions
via the named interfaces.

MikkelFJ wrote:

Sorry for the repost at perl, I didn’t release the post was crossposted (I
entered via Ruby).

“Neil Madden” nem00u@cs.nott.ac.uk wrote in message
news:LRaA9.1325$J55.292310@newsfep2-win.server.ntli.net

Hmm… the more I think about this, the more problems it seems to present.

I’d

love to be able to write an extension, and have it instantly work with x
different langauges. Also, I’d love to be able to use Python and Perl
extensions from Tcl, without loading seperate interpreters and all that.

Am I

dreaming of an impossible Utopia?

I wrote some DLL helper logic that makes it easy to query a vtable by name
in another DLL.
The vtables can be created in whatever way, but the vtable is not assocated
with allocated memory, as is the case with COM. This makes it quite flexible
without really being limited because one could decide that the first
function should return a ‘this’ pointer and the second function should be
the destructor, or whatever.

Basically the same as the Tcl STUBS mechanism for loading extensions in
different versions of the tcl interpreter without recompiling or
relinking the extension.

Michael

In article 3DD16F04.517E6720@earthlink.net,
.
.
.

Hmm, is there a way of making tcl dump the tcl-bytecodes to a file?
.
.
.
There are at least a couple, but neither is particularly
handy, at this point; most common is to install TclPro
Compiler and let it do the work. The other is to rebuild
a standard source distribution with a debugging flag
enabled, then set a run-time flag to report on byte-code
flow. Neither is as inviting at the corresponding ability
in, for example, Python.

I hope I’m wrong about this, that there’s an easier way,
and that someone will correct me.

···

Benjamin Goldberg goldbb2@earthlink.net wrote:

Cameron Laird Cameron@Lairds.com
Business: http://www.Phaseit.net
Personal: Home page for Cameron Laird

In article 3DD16F04.517E6720@earthlink.net,
.
.
.

Thinking a bit more, particularly about how Tcl often needs to interpret
strings at runtime, I realize that no non-trivial Tcl program can work
without having a string-to-bytecode compiler. Needless to say, this
.
.
.
Tangential remark: yes, in the sense that Tcl is just
a big 'ole macro processor, where everything is a string.
Sure, at that level, Tcl is constantly interpreting strings,
in a way that seems creepy from a Perl perspective. Perl’s
executable references correspond in Tcl to “scripts” (most
often) which appear as callbacks to [after], [bind],
[fileevent], and so on.

On the other hand, at the level of application development,
working programmers should not be doing much of the [eval]
kind of string interpretation that once was thought necessary
style in Tcl, as well as Lisp and very few other languages.
Source code should look straightforward and plenty
procedural, and, in general, will not “interpret strings”
after a first round of bytecode compilation.

···

Benjamin Goldberg goldbb2@earthlink.net wrote:

Cameron Laird Cameron@Lairds.com
Business: http://www.Phaseit.net
Personal: Home page for Cameron Laird