smilax:: Unithorpe [Changes]   [Calendar]   [Search]   [Index]   [PhotoTags]   

[Bedstraw] *smilax*


Unithorpe is a small interpreted programming language and its virtual machine.

The driving idea is to use a single unicode character to name each variable, function, namespace, builtin operator, etc. in the language. All data is either unicode characters or arrays of unicode characters or other arrays.

All values in Unithorpe are thingies. A thingy is either a Unicode Character (in the 16-bit code range 0..65535) or an array.

Unicode Characters can also be used as short unsigned integers, in the range 0..65535. Whether a value is a character or an integer depends on how it's used.

NUL Character
The NUL character, or the integer 0, is used for initial values of uninitialized thingies.

Arrays are composed of 65536 slots, each containing a thingy -- either an integer or a reference to another (or the same) array. The slots are indexed by Characters. Arrays initially contain NUL characters (0 integers) in all 65536 slots. All possible characters can be used for indices to any array.

Three primative operations are defined on arrays: Create an array, Get the value at an index, and Set the value at an index. Arrays are reference-counted: they go away when there are no more referemces to them, so no operation to explicitly delete an array is required.

Conventionally, strings of unicode characters in the range 1..65535 are represented by arrays. The values of the array slots are all characters (not array references), beginning with index 0. Strings may be of length 0 to 65534. At least one NUL characters pad the rest of the array.

Global Variables
One array known as the Global Array always exists. It is the starting point for all programs and data.

Global variables are initialized to have builtin bytecode operations set up for you.

The global array is used for many different purposes, like to hold bytecodes, scripts, namespaces, arrays, objects, strings, formal parameters, locals, temporary variables, and any other kind of data. It is the user's responsibility to decide which slots will be used for what purpose.

Programs are made of Unithorpe Scripts, which are Strings that begin with the character ';'.

Local Variables
Immediately following the initial ';' are the names of local variables for the script, if any, up to a space character. The local variables are also used as formal parameters. The first local variable is an in-out parameter; the remaining are in parameters. When the script is called, variables named after the command name are bound, in order, to the local variables of the script. Extra local variables with no calling variable bound to them are initialized to 0.

Local variables actually live in the global array. Before a script is called, the existing values in slots which will be local variables are pushed onto a stack, and restored when the script returns. These pushed values will be unavailable while they are pushed.

Command Fetch
After the ';', the local variables, and the space character come the commands of the script. Simple commands begin with a non-space character, which is used to index into the global array. This is called the "command fetch." What is found there determines what happens next. There are four cases:
  • If it is a character, it is treated as a "bytecode." Bytecodes name builtin primative operations, defined later.
  • If it is a reference to an array with ';' in slot 0, it is a Unithorpe script to be executed.
  • If it is a reference to an array with integer 0 in slot 0, it is a namespace. The next character in the script is an index into this array, where command fetch is repeated. A command may traverse several namespaces this way.
  • If it is a reference to an array with another array in slot 0, then the array is an "object" and the array referenced by slot 0 is the "class'. The next character in the script is an index into the class array, where command fetch is repeated.

Command Arguments
The characters after the command fetch characters are arguments to the command. Arguments stop at the first space or NUL character, but a grouping character causes inclusion of all characters up to the closing character of the group.

Grouping Characters
Grouping characters come in pairs, and include { }   [ ]   ( )   < >   ` '

Until unicode tools are available, in the script input to the interpreter, a backslash followed by any two ASCII characters creates a 16bit character code, whose high 8 bits is the first ASCII character, and whose low 8 bits is the second ASCII character. This three-ASCII-char sequence, called pseudothorpe, creates a single character code in the range 256..65535. For instance, these pseudothorpe operators are used as if each was a single unicode character: \eq \ne \lt \le \gt \ge

Builtin Bytecodes [unithorpe version 0.1]

These bytecodes are initialy assigned to their own slot in the global array. The initial character names the bytecode, and other characters are arguments.

Arguments which are single characters name global variables for inputs or outputs. Conventionally, an output argument comes before other arguments. Arguments which are "blocks" begin with a grouping character, and end with the next occurrance of the matching grouping character. Arguments end with a space or NUL that is not in a block.

\n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n
Command Usage Mnemonic Description
! !Z anew set variable Z to a new array containong all 0s.
, ,Zai aget set Z to the value in slot i of array a
. .aix aput set slot i of array a to the value in variable x
; ; return return from the script
+ +Zab plus set Z to the sum of a and b
- -Zab plus set Z to the difference of a and b
* *Zab plus set Z to the product of a and b
? ?c{block1} if if c is not 0, then do block1
? ?c{block1}{block2} ifelse if c is not 0, then do block1, else do block2
@ @a{block1}b{block2} while repeatedly test variables (like a or b) or do blocks, jumping out when a test variable is 0
\eq \eqZab equals set Z to 1 if a equals b, to 0 otherwise
\ne \neZab not_equals set Z to 0 if a equals b, to 1 otherwise
\lt \ltZab less_than set Z to 1 if a < b, to 0 otherwise
\le \leZab less_equals set Z to 1 if a <= b, to 0 otherwise
\gt \gtZab greater_than set Z to 1 if a > b, to 0 otherwise
\ge \geZab greater_equals set Z to 1 if a >= b, to 0 otherwise

Initial Global Values
Each of the above command characters ! , . ; + - * ? @ \eq \ne \lt \le \gt \ge is set in the global array to its own bytecode value, which is the same as the command character. Also the slots indexed by characters '0' '1' '2' '3' '4' '5' '6' '7' '8' & '9' are set to the integers 0 1 2 3 4 5 6 7 8 & 9, respectively.

Ordering and equality on integers (characters) is the natural unsigned order. All integers are less than all array references. Array references have an arbitrary total ordering. Only references to the same array are equal.

Notice that the "if" and "while" operators treat 0 as false, and everything else as true.


Elaborate... (Object is bound to '$')

This script computes the length of a string:
;Zat +Z00 @{ ,taZ }t{ +ZZ1 } ; strlen
It begins with ';' to mark it as a Unithorpe script. The local variable Z will be used to output the result. The local variable a will be the string input. The local variable t will be a temporary.

The command +Z00 means set Z to the sum of integers 0 and 0, that is, to 0.

The command @{ ,taZ }t{ +ZZ1 } is a while loop, which will get slot Z of array a and put it in t; will break if t is NUL; will increment Z; and repeat.

Finally the ';' is the return command. Whatever is in Z will be returned.

Everything following the final ';' is a comment, since it can never be reached.

If the above script were bound to slot 'L' in the global array, the following script would create an array, set the first three slots to '8' '8' '8', and find its string length (3) into variable x.

; !a *n68 +nn8 .a0n .a1n .a2n Lxa
First it makes a new array and puts it in a. Then it puts 6*8=48 into n. Then it adds another 8 to n, to make 56, the value of ASCII '8'. Then it writes that to slots 0, 1, and 2 of the array a. Then it calls script L (the strlen above) on array a, with result output in variable x.

Loading a Program

Input & Output Primatives

Implementation Suggestions

Thingies should be (in C) unsigned long. If the value is less than 65536, it's a character value. Otherwise it's a pointer to an array.

Arrays could be always 65536 thingies long, or they could be a structure that grows when needed, with all slots beyond its actual size behaving like 0. Other sparse representations could also be used.

Arrays should be reference-counted or garbage-collected.

Some structure is needed to implement the stack of saved values when a script is called.

Standard Library
We could use libraries for strings, collections, bignums, ...

A higher-level language "Bithorpe" is planned. It should be interpreted by unithorpe code, or translated down into unithorpe. Where Unithorpe is register-based, Bithorpe should be expression-based, with binary operators.


TODO in future versions
  • =Za assigns a to Z
  • !Z{string value} creates new array initialized to "string value".
  • consider {abcde} creates new array with string "abcde". (Then is the ! operator obsolete? use =Z{} instead?) for any input variable
  • 'Z{c} assigns literal character 'c' to Z
  • work on objects and classes
  • arithmetic & boolean operators
  • use real unicode instead of pseudothorpe for \eq \ne ...
  • consider a Special Form (MACRO), perhaps a special argument that gets an array of the rest of the parameter values. Also a dynamic local variable creator.
  • unithreads
  • Unicode GUI
  • JIT compiler, precompiler

/section SemiThorpe
SemiThorpe is a semi-normal semi-unithorpish language which compiles into Unithorpe code for the Unithorpe Virtual Machine. (thanksgiving day, 2004)

[ Sorry, all guestbooks disabled temporarily due to rampant spam :(   ]


This C source is very incomplete and untested and just might hurt your computer or your head:


(last modified 2005-05-10)       [Login]
This page is referenced by the following pages:
WebLog #1 Topic: 2004-11-09 09.49.32 strick: Unithorpe Spec Available