BASCOMP V2 Documentation ========================== Ian Cull Bsc. 24/3/90. ---------------------- Introduction. ------------- BASCOMP is a Basic compiler, written in Basic! This may seem pointless, until you realise that BASCOMP is capable of compiling itself. The code that BASCOMP produces is assembly source text, rather than straight machine code. The reason for this is that it offers a true insight into how compilation is achieved, without the technical difficulties involved in producing actual code (such as forward referencing, etc.). It also allows for the resultant code to be edited before assembly, which can result in even greater speed increases over the original Basic program. The resultant assembly source text can be converted into actual machine code using BASASM, a fast Basic assembler also written in Basic. The Basic programs that can be compiled by BASCOMP are a limited subset of what can be written for the Spectrum computer. This, again, is deliberate so that the compilation process can be understood more easily. Most significantly, BASCOMP is an integer-only compiler. However, since BASCOMP is itself written in Basic, it is quite feasible for additional facilities to be written. BASCOMP can then be recompiled, to give a new compiler able to handle whatever new statements are required. Future Faster Than Basic articles in FORMAT will give some examples of expanding BASCOMP. BASCOMP has been written so that, while uncompilable Basic is reported, no error checking code is incorporated in the resulting assembly program. This means that the compiled programs run faster and are simpler to understand. However, it also means that programs should first be thoroughly tested using normal Spectrum Basic, before trusting the machine code version. Files supplied. --------------- bascomp2 Basic source code for Basic Compiler basasm2 Basic source code for Basic Assembler asmcompC Code version of bascomp2 (executes at 50000) and basasm2 (executes at 43000) primes1 Test program (see FORMAT Oct.'89) primes2 Test program (see FORMAT Oct.'89) plot Test program for PLOT & POINT graphics routines basasm2Cl Code version of basasm2 (executes at 25000) basasm2Ch Code version of basasm2 (executes at 58000) BASCOMP capabilities. --------------------- BASCOMP will compile integer-only Basic programs, producing assembly source text output to stream 4. Only numbers -32767 to +32767 should be used, although numbers 32768 to 65535 can be used, and will be converted to their negative equivalent (32768==-32767, 65535==-1). Numeric variables 'a' to 'z' are available - but no multi-letter names. Single dimensioned numeric arrays 'a()' to 'z()' are available. Strings 'a$' etc. and string arrays 'a$()' etc. are not available. Multiple dimensioned numeric arrays are not available. Valid Invalid LET a=1 LET aa=1 LET a(1)=2 LET a(1,2)=3 LET a$="1" Since numeric variables 'a' to 'z' only give twenty six possible variables, it may be necessary to use an array as a set of distinct variables (e.g. a(1), a(2), ...). To this end, array variable references using a number in the brackets, rather than a variable, are coded as efficiently as a normal variable reference. I.E. LET a=1 and LET a(1)=2 would be efficiently coded. LET a(b)=1 would be less efficiently coded. The following Basic statements are supported :- BORDER, BRIGHT, CLS, DATA, DIM, FLASH, FOR / NEXT / STEP, GOSUB / RETURN, GOTO, IF / THEN, INK, INVERSE, LET, LPRINT, PAUSE, PAPER, PLOT, PRINT, POKE, RANDOMIZE USR, READ, REM, RESTORE, STOP. NOTES: BRIGHT, FLASH, INK, INVERSE and PAPER can only be used as statements in their own right - they CANNOT be combined within PRINT or PLOT statements. An automatic CLS is done when the program is executed. DATA can only be numeric. FOR is able to handle positive and negative STEP values, although only a number (and not a variable) can be used. The left & right parts of the TO can only be numbers, variables or acceptable arrays (not expressions). Up to ten nested FOR/NEXT levels are supported, but this is not checked. There must be only one NEXT for each FOR. GOSUB/GOTO can only handle numbers; statements like GOTO a*10 are not supported. Note that GOTO/GOSUB/RETURN are directly translated into assembly code JP/CALL/RET instructions. BASCOMP capabilities continued -------------------- LPRINT/PRINT are more limited than the Basic form. TAB and the comma and quote functions (, and ') are implemented but AT is not. Stream redirection is available (PRINT defaults to stream 2, LPRINT to stream 3). The CHR$ function can be used HERE ONLY to convert a variable to a character (PRINT CHR$ a). A quoted string (PRINT "...") can also be used. Variables or numbers can be used (PRINT a) but expressions cannot (PRINT a*3). NOTE: Preceding a variable or number with a '+' sign will allow negatives to be printed as their positive equivalent. POKE can be rather limited, since only locations up to 32767 can be used. It is possible to deal with this problem by using negatives (e.g. -1==65535, etc.) or by adding two smaller numbers (32767+1==32768). STOP is coded to print an OK prompt. If the program 'runs off the end', the same message is given. Note that if you wish to produce a machine code subroutine, simply put a RETURN at the end of the program. The following functions are supported for use when calculating integer results :- +, -, *, /, (...), <, <=, =, >=, >, <>, ABS, AND, CODE, NOT, OR, PEEK, POINT NOTES: / is integer only (3/2==1). INT can be used to make sure of this when testing - INT is ignored by BASCOMP. Parenthesis nesting (...(...)...) can be to eighteen levels, but is not error checked. There is no overflow checking on calculations (20000+20000=- 25536). This can be useful is the PRINT + function is used. AND, NOT and OR always give 0 or 1 results (Spectrum basic does not ALWAYS stick to this rule). CODE can only be used in the form CODE "single-chr" which is converted during compilation to the number representation of single-chr (CODE "A"==65). NOTE: CODE """"==34 is an exception to this. NOTE: CODE INKEY$ can be used to check for any key presses. If no keys are pressed, 0 is returned, otherwise the key code is returned. CODE INKEY$ #x can also be used, to fetch the next character from an opened channel. PEEK needs dealing with as for POKE. Error reporting by BASCOMP. --------------------------- If BASCOMP finds some Basic which it cannot compile, an error is reported. As BASCOMP runs, the line number being compiled is displayed. If an error occurs, then an error code is displayed also. The offending line is then displayed, with a flashing inverse question mark showing where the problem was detected. The error codes and their causes are listed at the end of this document. BASASM capabilities. -------------------- BASASM will convert assembly source text produced by BASCOMP, read in from stream 4, to machine code. All characters must be in upper-case. Not all machine code instructions are supported - only those used by BASCOMP have been checked. Labels VA...VZ can be used to identify variables, and must be defined BEFORE being used. However positive or negative offsets can be applied (VA+6). Labels Xnnnn and Lnnnn can be used to identify locations, and need not be defined before being used. When an Lnnnn location is defined, it will be displayed during assembly to indicate assembly continuing. Comments (preceded by a semicolon) will be ignored. Fields are separated by a single space, tab or comma character. A line ends with a carriage return. All other control codes are ignored. An END must be used to terminate assembly. BASASM can assemble code into any memory area not already in use. There are 3 versions, running at 25000, 43000 and 58000, so one should be suitable for producing code at any required address. BASASM can define variables in the second and third halves of screen memory, provided that no more than 4096 bytes are needed, releasing more memory for code - BASCOMP & BASASM themselves are assembled in this way. BASASM is a SINGLE-PASS assembler, which simplifies stream handling on the Spectrum. This is the reason why VA ... VZ must be defined BEFORE being used, and why offsets cannot be applied to Xnnnn and Lnnnn labels. Error reporting by BASASM. -------------------------- BASASM should find no errors in source text produced by BASCOMP. Therefore errors simply generate an error number followed by the assembly source text line in which the error was found. It is also possible for Xnnnn or Lnnnn labels to be undefined (found at the end of assembly). Using BASCOMP. -------------- BASCOMP and BASASM are supplied as Basic programs and also ready-to-use as code in the file "asmcompC". It is suggested that this file be used, since re-producing the code from the Basic programs is VERY slow (although perfectly possible). To load the machine code, first CLEAR 42999, then LOAD "asmcompC" to get the machine code. If the space is needed, you can then CLEAR 49999 to free space taken by the BASASM code, leaving only the BASCOMP code which starts at 50000. NOTE that when BASCOMP is run, the screen will be used for storage. To use BASCOMP, stream 4 should first to opened to a microdrive/disk file to which the assembly source text will be sent. To run the BASCOMP machine code compiler, use RANDOMIZE USR 50000. After two (or three if DATA is present) passes through the Basic program, a report will show how many bytes will be needed for variables (the V= number) and how much data was compiled (the D= number). Once compilation has finished (or if an error occurs) stream 4 should be closed. Compiling BASCOMPs output. -------------------------- When you successfully get BASCOMP to run, you will end up with a largish file containing assembly source text. To produce a machine code program, this must be assembled. To use BASASM, stream 4 should first be (re)opened to the same file to which the assembly source text was sent. To run the BASASM machine code assembler, use RANDOMIZE USR 43000 (or 25000/58000 if a different version is in use). BASASM will prompt for the start address for the final code, and also for whether variables should be stored in the screen (which avoids the variables taking up program space, but can only be used for up to 4096 bytes of variables). At the end details will be given of the size of the code and other (possibly) useful information. If any Xnnnn or Lnnnn labels are undefined, they will now be reported - pressing any key will continue BASASM running. Xnnnn labels should never be undefined, but Lnnnn labels could be undefined if the compiled Basic program has GOTO/GOSUBs to non- existent lines (which is OK in Spectrum Basic but is bad practice) or to lines which compile to no code (such as REM or DATA lines). Once assembly has finished (or if an error occurs) stream 4 should be closed. Opening a file for output. -------------------------- Microdrives: OPEN #4;"m";1;"output" (or similar) Wafadrives: OPEN #*4;"a:output" (or similar) Plus D disk: OPEN #4;d1"output" OUT (or similar) Plus 3 disk: RANDOMIZE USR 23734 Swiftdisk: OPEN #%#4;0;"output","W","T" Opening a file for input. ------------------------- Microdrives: OPEN #4;"m";1;"output" (or similar) Wafadrives: OPEN #*4;"a:output" (or similar) Plus D disk: OPEN #4;d1"output" IN (or similar) Plus 3 disk: RANDOMIZE USR 23740 Swiftdisk: OPEN #%#4;0;"output","R","T" Closing a file. --------------- Microdrives: CLOSE #4 Wafadrives: CLOSE #*4 Plus D disk: CLOSE #*4 Plus 3 disk: RANDOMIZE USR 23737 (done by OK automatically) Swiftdisk: CLOSE #%#4 Notes. ------ Microdrive and Wafadrive users must first delete any previous file of the same name, before opening a file for output. Plus 3 users can use the additional supplied machine code (installed by RANDOMIZE USR 65024) to open a file "OUTPUT.LST" to stream 4. It is included at the end of the "asmcompC" file. After this, RANDOMIZE USR 23734 opens the file for output, RANDOMIZE USR 23740 opens the file for input, and RANDOMIZE USR 23737 closes the file. Note that file closure is automatic whenever the OK message (or any error) occurs, so RANDOMIZE USR 23737 is rarely needed. Plus D users must contend with a bug in the Plus D ROM which causes files opened for output to be wrongly written if delays occur during output (and the disk stops spinning). Additional supplied machine code called by BASCOMP checks for and solves this problem - use POKE 64970,42 after loading the "asmcompC" file to include the code. Since files are OPENed/CLOSEd from Basic and no special facilities are needed, it should be possible to use BASCOMP and BASASM with any Spectrum drive system (even the RAMdisk on the ill-fated +2A). A Sample Session. ----------------- Let us try compiling the following program using BASCOMP and BASASM. Microdrive syntax will be used, though the same applies whatever filing system is used :- 10 FOR x=1 TO 10 20 LET y=x*x 30 PRINT x,y 40 NEXT x WARNING: BASCOMP does not like spaces in your Basic programs - do not put any spaces in other than the ones which automatically appear. First we CLEAR 42999, then LOAD *"m";1;"asmcompC" CODE to get the compiler into memory. After typing in the above program, we type :- OPEN #4;"m";1;"test": RANDOMIZE USR 50000: CLOSE #4 The numbers 10, 20, 30, 40 flash up as the program is compiled, first preceded by a 1 then by a 3, then the message *END* (V=4, D=0) appears indicating that compilation is finished. The V= shows how many bytes were used for variables, the D= shows how many were used by DATA. The few strange graphics on the screen are variables used by BASCOMP. The file test is produced :- JP X1 (skip past variables & DATA) VX DEFS 2 (here is the variables area) VY DEFS 2 (DATA would go here) X1 CALL 3503;CLS (clear the screen) CALL X2 (initialise the variables) L10 LD HL,1 (x=1 initially) LD (VX),HL X4 LD HL,10 (x>10 yet?) LD DE,(VX) CALL X10010;TEST (library routine) JP C,X5 (past NEXT x when done) L20 LD HL,(VX) (get current value of x) PUSH HL LD HL,(VX) (get it again to do x*x) POP DE CALL X10008;MULT (library routine) LD (VY),HL (y=x*x) L30 LD A,2 (select stream 2) CALL 5633 LD HL,(VX) (print x) CALL X10006;PRTINT (library routine) LD A,6 (do a comma-tab) RST 16 LD HL,(VY) (print y) CALL X10006;PRTINT LD A,13 (newline at end of PRINT) RST 16 A Sample Session continued ---------------- L40 LD HL,(VX) (x=x+1 since no STEP given) LD DE,1 ADD HL,DE LD (VX),HL JP X4 (back to FOR/NEXT test) X5 (to here when its over) L0 RST 8 (end of program == STOP) DEFB 255 (give an OK message) X2 LD BC,4 (initialisation code. There LD HL,VX are 2 variables to set up X3 LD (HL),0 which mean 4 bytes to be INC HL set to zero) DEC BC LD A,C OR B JR NZ,X3 RET X10004 DEFB 213,17,0,0 (Library routines) . . . DEFB 201 END (End of the assembly) If you have ever looked at Z80 assembly code before, it should be quite easy to understand the above program - even if I had not added the comments. Furthermore, it is easy to relate the code back to the Basic program (The L labels correspond exactly with the Basic line numbers). Note that the code could be made more efficient, even with such a short program. The x*x routine contains a redundant LD HL,(VX) for example. This is because BASCOMP is doing its compilation in quite a simple fashion - so that it can be under- stood. Library routines. ----------------- The X1nnnn routines are library code, written to the end of the assembly source text by BASCOMP. Only those routines needed are written, but all are written as DEFB numbers rather than as assembly source, which makes them difficult to understand (but easier for BASCOMP and BASASM to handle). Data. ----- Any DATA statements will be converted to DEFW numbers during the second pass. The lines will be labelled L1nnnn (10000 more than the normal line number to distinguish them). If there are no READ statements, the second pass will not occur! A Sample Session continued ---------------- Now to compile the above assembly source text, to give us our machine code program. We type :- OPEN #4;"m";1;"test": RANDOMIZE USR 43000: CLOSE #4 BASASM will now prompt for a start address (try 50000) and whether to store the variables in the screen (try N). The labels L10, L20, L30, L40, L0 will flash up, then the following will be displayed (again, ignore the strange graphics) :- Code from 50000 to 50213 (Len=213) Total L-labels: 5 Total X-labels: 13 Vars: XY 1 bytes optimised Most of the above is obvious. The maximum number of L-labels and X-labels that BASASM can handle is 500 (but is not checked). Only used variables are listed. The 'bytes optimised' message indicates if any code was shortened. As commented earlier, BASCOMP produces rather inefficient code and BASASM is programmed to recognise and re- code a few of the cases. The 1 byte saved here was by rewriting the code for line 20 :- L20 LD HL,(VX) EX DE,HL (Quicker and shorter) LD HL,(VX) CALL X10008;MULT LD (VY),HL Running the code. ----------------- RUN will give the ten numbers and their squares quite quickly, in Basic. However, RANDOMIZE USR 45000 should give the numbers 'instantly'. The difference here is small for this example, since most of the time is spent displaying the results. If you PRINT PEEK 50002*256+PEEK 50001, this gives the true start of the code (after the variables and DATA). This is 50007 for this example. Calling this address plus 3 (50010 for the example) will skip the call which clears the screen. Calling this address plus 6 (50013 for the example) will also skip the initialising of the variables. This could be useful if the code is to be used as a subroutine (another line 50 RETURN can be added to produce a subroutine). Re-Compiling BASCOMP and BASASM. -------------------------------- If you want to add to BASCOMP or BASASM, you will need to re-compile them to produce a new "asmcompC" file. If you use any new Basic commands, you will not be able to use the old "asmcompC" file to compile the Basic. Similarly if you use any new assembly source commands. In either case, you will have to RUN the Basic source, which will work but will be VERY slow. When re-assembling, remember to specify the same start addresses (50000 and 43000) and answer Y to the 'store VARs on screen' question. Note that the BASCOMP Basic cannot fit into memory along with the full "asmcompC" file, unless CLEAR 49999 is done. Note too that BASASM will overwrite itself if 43000 is specified, unless you use a different version (the one at 25000, preferably). Finally, remember that the "asmcompC" file also contains code from 64970...65535, needed for Plus D or Plus 3 users, which should not be overwritten. Speeds. ------- Obviously the point of compiling a program is to make it faster. As an example of what can be achieved, the following is offered. Note that your timings, while likely to be similar, are unlikely to be identical. This is due to hardware differences, as well as the fact that BASCOMP is likely to have been altered ('improved') since these timings were made. I used a 128K Spectrum in 48K mode, with a 3.5inch Plus D system. First I used a compiled version of BASCOMP to compile BASASM, producing an assembly source text version of BASASM on disk. RUNning BASASM took 125 minutes to assemble itself (this is called 'bootstrapping'). Executing the resulting machine code took just 63 seconds to assemble itself - this is over one hundred times quicker, without taking any account of the disk access overheads, which are a significant part of the 63 second timing. Other test timings included :- "primes1" (without displaying each result) down from 32.4 seconds to 1.6 - an increase of about twenty times. "primes2" (without displaying each result) down from 41.3 seconds to 0.4 - an increase of over one hundred times. "plot" (a graphic test) down from over ten minutes to 15 seconds - an increase of forty three times. BASCOMP error messages. ----------------------- If BASCOMP discovers some Basic code which it cannot compile, it issues an error message and displays the Basic line, showing (with an inverse flashing question mark) where the error was found. It should be obvious what is wrong. However, the error messages which can be given, and their meanings, are listed below. ERROR * This error should not occur, unless you have incorrectly altered BASCOMP (see line 1550). ERROR . This error indicates that the end of a statement was not where it was expected. This can be caused if uncompilable Basic is used Example: GOTO 10*a PRINT AT 10,10; ERROR @ This error is given when array variables are used wrongly Example: LET a(1+2)=3 ERROR CODE This error indicates invalid use of the CODE function Example: LET a=CODE a$ LET a=CODE "abc" ERROR DEF FN This error should not occur but is related to ERROR FN (see line 1250). ERROR DIM This error indicates an invalid DIM statement Example: DIM a$(10) DIM a(10+20) ERROR FN This is given if an uncompilable function is used Example: LET a=PI LET a=SQR 10 ERROR FOR This error indicates an invalid FOR statement ERROR INT This is given if an integer is expected and not found Example: GOTO a*10 LET a=1.2 ERROR LET This is given if a LET statement is unrecognised Example: LET abc=1 BASCOMP error messages continued ---------------------- ERROR LINE This is given if an uncompilable statement is used Example: 10 DRAW 10,10 ERROR NEXT This error indicates an invalid NEXT statement ERROR POKE This error indicates that the comma separator was not found where it was expected in a POKE statement ERROR THEN This error indicates that THEN did not follow IF where it was expected. This can be caused if uncompilable Basic is used ERROR TO This error indicates that the TO part of the FOR statement was not found where it was expected Example: FOR x=a+1 TO 10 ERROR VERIFY This error indicates that a BASCOMP check has failed. It could occur if an array is dimensioned twice, or if the DIM statement does not occur before use of the array. BASASM error messages. ---------------------- As has already been mentioned, BASASM should not find errors in the assembly source text file. If, however, an error is found, it is identified by an error number. This number is the line number in the BASASM Basic program in which the error was found. The assembly source text line being processed at the time is also displayed. Therefore, to identify an error found by BASASM you will have to reference the BASASM Basic program, and figure out what BASASM was attempting to assemble at the time. This is not made easy because BASASM is written wholly using DATA statements to identify assembly source text (because strings are not supported by BASCOMP). There are limited REM statements in BASASM, which should help! WARNING: BASASM is a single pass assembler, and deals with forward references by building 'chains' of unresolved references within the code being produced. This is efficient, but means that it is not possible to assemble over the ROM area - just to find out what size the resulting code would be. File Details. ------------- bascomp2 Basic program which is capable of 'bootstrapping' itself to produce an assembly source text version of the same program. The output is to stream 4 (unless the value of variable a is altered). All required 'library' code is included (as DATA statements - see code at lines 8200 onwards). basasm2 Basic program which can convert assembly source text produced by bascomp2 into a machine code program. The input is from stream 4. Certain optimisations are attempted during assembly. asmcompC Machine code versions of bascomp2 and basasm2, LOADs at 43000 (length 22536). At 43000 is basasm2 code; at 50000 is bascomp2 code. Plus D version has code at 64970 to stop disk writes to a disk which has stopped spinning. Plus 3 version has code at 65024 which installs routines to implement stream 4 handling of a file OUTPUT.LST. primes1 Sample Basic programs which can be compiled to see what primes2 improvement BASCOMP can give - try removing the PRINT statements within the loops to see speed increases of up to 150 times! plot Sample Basic program which tests the increase in speed of the graphics PLOT & POINT functions. basasm2Cl Alternative machine code version of basasm2, runs at 25000. For use when producing code which overlays 35000...65535. NOTE: Runs 20% slower, since it is in contended RAM. basasm2Ch Alternative machine code version of basasm2, runs at 58000. For use when producing code which overlays 25000...57999.