MULTISEARCH (from Your Spectrum 12, Mar.1985) After a brief sojourn writing commercial software, we welcome programming guru Simon Goodwin back to the pages of YS with his first major utility since ZIP! Multisearch might be somewhat smaller than its predecessor but, as a fully relocatable 'search and replace' utility in just 225 bytes, it too is dedicated to the art of speeding up your Basic programs. Don't limit yourself to any other utility - make more of Multisearch! How many times have you laboriously gone through a ZX Basic program, replacing one item with another? Well, despair no more, Multisearch will quickly and automatically find and replace almost any selected item. This routine is easy to use and is only 225 bytes long. It'll run anywhere in memory (so it doesn't interfere with other utilities) and, what's more, turns out to have lots of useful and unexpected applications. POWERFUL POSSIBILITIES The possibilities of Multisearch aren't limited to changing one message for another. You can use it to edit long program lines, to replace keywords or to document programs (replacing line number references with names). Multisearch will also work the other way, replacing names with numbers - which is very useful if you intend to compile a Basic program into machine code. Most interesting of all is the possibility of writing programs which edit themselves; Multisearch can easily be called while a program runs. In this article we will investigate the internal format of ZX Basic and show how you can use Multisearch to make programs faster, more concise, or to protect them against people who want to fiddle with them (Troubleshootin' Pete, please note). INSPIRATION The idea of Multisearch came when YS reviewed a job lot of 'programmers' toolkits' a number of months ago. These are designed to make life easier for Basic programmers, but they all turn out to have a common flaw - they won't let you replace numbers in a program automatically. Some of the toolkits had a 'search and replace' facility, but they all had annoying limitations - for example, Super Toolkit would only replace single keywords. The suggested use was to change LPRINT into PRINT or vice versa, but in fact that's pretty pointless because you can get the same effect on any Spectrum with a standard (but undocumented) command: OPEN #2,"p" This sends the output of PRINT statements to the printer until you cancel it with: OPEN #2,"s" If you want to work the other way, you can use: OPEN #3,"s" to send the results of every LPRINT statement to the screen. When you want to use the printer again, the command: OPEN #3,"p" will set things back to normal. It's a bit more useful to be able to replace text in a program - perhaps you might want to Americanise the word 'colour' by replacing it with 'color', or enforce some similar indignity. But by far the most useful application baffles every single toolkit - the problem of changing numeric values within a program. INSIDE BASIC The accompanying figure shows the rather complicated way the Spectrum stores a simple Basic program: 10 PRINT 2+VAL "2" 20 GO TO 10 Most of the data is ASCII code - for instance, 34 is the code of inverted commas and 236 is the code of the keyword GO TO. A full list of the keyword values is in Appendix A of the Spectrum manual - take a look at the strange way the Spectrum stores numbers. Most numbers in a program are also stored in a hidden 'binary form' which takes up six extra bytes. This is meant to make programs run more quickly, by removing the need for the computer to convert numbers from text to binary whenever they are found. In practice, VAL "2323" can be handled almost as fast as the number 2323, and the first version uses three less bytes, because the string value doesn't have a hidden 'binary form'. In the figure, you can see that VAL "2" needs three less bytes than '2' on its own. The number '2' is followed by a 'marker' byte (code 14) which tells the LIST routine to skip the next five bytes - the binary form of the number. When the program RUNs, the text is ignored and the binary form is used. The binary is in a rather odd format - one which is explained in Dr Ian Logan's excellent book, Understanding Your Spectrum (published by Melbourne House). Luckily, with the aid of Multisearch, you don't need to understand the format to manipulate it. The upshot is that numbers in ZX Basic programs need careful treatment, as they can gobble up memory at an alarming rate. Some expressions for numbers are even more concise than the 'VAL' version, because they use the keyword PI instead of a number. PI only occupies one byte in a program. The accompanying table lists a few common values and the expressions to replace them, along with the number of bytes saved ('n' represents any number). You could use variables with preset values instead of numbers to get a similar saving in space, but beware - ZX Basic is rather slow at finding the value of variables; expressions like SGN PI may be worked out more quickly, especially if your code uses lots of variables anyway. Interestingly, values expressed using the BIN function are also stored in two forms, so that BIN 1 soaks up eight bytes - one for the keyword, one for the digit, and an extra six for the genuine binary form. The line numbers at the start of each line are stored in a more sensible 'packed' format - each number occupying just two bytes. They are converted into decimal by the LIST routine in the ROM. The two bytes after each line number hold the length of the line, so that Basic can skip quickly from one line to the next. An 'ENTER' character is at the end of every line. This format is briefly explained in the Spectrum manual, on page 166. The first program given is a simple loader which will store the machine code for Multisearch at address 30000. To use it, simply RUN the program and if you've made no typing mistakes, the correct code will be stored. If there's a mistake in the data, an appropriate message should appear. It's wise to SAVE the program as soon as it has apparently run correctly, just in case an error has slipped through. If you save the code you can then load it again - without the Basic - at any address. MULTISEARCH ON THE RUN The routine is very easy to use, and all you need to do is load the code into any free area of memory. It's 225 bytes long, so if you've already got another machine code routine from address 53246 onwards, you might CLEAR 53020 and load the code at 53021. Multisearch will work happily on a 16K computer. If you're really pushed for space you could load it into the printer buffer at 23296, so long as you don't use the printer until you've finished with Multisearch. Wherever it ends up, you call the routine by jumping to its start - with RANDOMIZE USR 53021, for example. But before you do this you must tell Multisearch the text you want to alter. You do this by setting the Basic variables S$ and R$. Logically enough, S$ should contain the text you want to search for, and R$ should contain the replacement. This is the essence of the power of Multisearch - the text can be program- generated, so you're not just limited to what you can type in. You can enter keywords in strings by typing THEN (Symbol Shift 'G'), followed by the keyword, and then stepping back to scrub out the THEN before you press Enter. If you load Multisearch into the printer buffer you could try it out with this simple program: 10 LET S$="OLD TEXT" 20 LET R$="NEW TEXT" 30 RANDOMIZE USR 23296 When you RUN the code and LIST it you'll find that S$ and R$ now refer to the same text. Of course, S$ and R$ don't have to be the same length. The only restrictions are that both strings must be less than 256 characters long, and S$ mustn't be empty (!). In either case, Multisearch detects the problem before it tries to alter anything, and reports a 'Parameter error'. If S$ or R$ are not set, you'll receive a 'Variable not found' message and the program will be unchanged. Multisearch is very fast, but it can take a few seconds to make major changes to a long program. You can break into it while it's working by pressing the Space key. The routine stops once it's made the current change and spits out a 'Break into program' message. If the routine runs out of room to make changes it'll do as much as it can and then report 'Out of memory'. It's important to realise that Multisearch doesn't check the syntax of lines as it alters them - this would make it slow and much less versatile. However it means that you can thoroughly mess up a program by, say, changing all the LET keywords into POKEs. If you corrupt a program in this way you'll get a 'Nonsense in Basic' error when you try to RUN it. Be careful if you change the keywords back automatically - you could end up changing genuine POKEs into 'nonsense' LETs. The moral of the story is to be careful before you use Multisearch ... if in doubt, SAVE your Basic before you mangle it. TRICKY DIGITS This business of using strings is all very well, but it doesn't help us replace numbers in program lines. We can't store a number in a string without putting it in quotes (or using STR$). LET A$="1" is OK, but LET A$=1 gives an error, and we've already discovered that numbers outside quotes have a special format. To illustrate this, try out the following program: 10 LET S$="40" 20 LET R$="60" 30 RANDOMIZE USR 23296 40 PRINT "Hello"; 50 GO TO 40 60 STOP When you RUN this program it'll replace the text '40' in line 50 with the text '60'. However, it won't replace the hidden binary form; the program still prints out 'Hello' over and over again, because ZX Basic uses the binary form of the line number (still 40), and ignores the text completely. You end up with a line that reads GO TO 60 and performs a GO TO 40! This is a very useful trick to discourage people from editing your programs - you can jumble up the text of the line numbers but the program will still work correctly because the binary forms are unchanged. The hidden binary is removed when a line is edited (to stop it getting in the way as you move along the line) and the binary is re-calculated from the text when you press Enter. This means that the jumbled values are taken literally after a line is edited, changing the way the program works and hence discouraging fiddlers. You can save a little memory by replacing the text of each number by a single digit. However you can't dispense with the text altogether - there must be some numeric text between the GO TO and the CHR$ 14, or Basic will spot the subterfuge and give the game away with a 'Nonsense in Basic' error. BINARY CHOICE We still can't alter numbers properly. The routine so far will only change text within a program ... it can't replace the binary form of numbers. The solution is to distinguish between numbers and strings, and use a small Basic program to work out the binary form of a number. An appropriate routine is given, which should be MERGEd with your Basic program once the Multisearch code is loaded. Rather than use a complicated routine to generate binary forms, this program 'cheats' by storing the required number in a variable and then PEEKing the contents of the variable area (which always contains binary values in the same form as that used within programs). To use the program type GO TO 9990 and press 'T' or 'N' to indicate whether you want to search for text or a number. Then type the data required, exactly as it appears in the program. If you select 'N', the program adds the numeric form to S$. Next you specify the replacement, which may (once again) be text or a number. The program STOPs once the requested changes have been made. This technique is not ideal, but it does allow numbers to be changed properly without denying you the ability to alter numeric text and leave binary forms unchanged. If you need to process a pattern which contains a number, you'll need to add other characters around the search or replacement string, using the normal Spectrum string handling commands. You can use the 'binary form' program as a subroutine if you replace the STOP in line 9902 with a RETURN and get rid of the CLEAR statement in line 9900. However you must make sure that V is the first variable encountered when your program is RUN. The routine finds the binary form of a number by storing it in variable V, and then PEEKing the first entry in the variable table. If V isn't the first entry you'll get incorrect results. ASSEMBLER LISTING Multisearch uses a number of interesting routines and could form the basis of a complete Basic toolkit. The assembly code of the routine, produced by the whizzo new Microdrive version of the Picturesque Editor Assembler, is a little more repetitious than it need be, since it's written in relocatable code. This means it'll run anywhere in memory without modification, but also that it can't use any internal subroutine calls, since the location of each subroutine is not fixed. Broadly speaking, the program can be divided into two sections. The first part (up to the label LINE) is used to find the variables S$ and R$ and check that they contain correct values. The code to find S$ is duplicated to locate R$ - the only difference is the letter of the name and the extra check to make sure that S$ contains at least one character. At FINDS, the program points HL into the variable area and then looks for a capital 'S'. This indicates the start of the storage allocated to S$, as explained on page 168 of the Spectrum manual. The ROM routine F_VAR is used to step from one entry to the next until the required letter is found, or the end of the table is reached - in which case a 'Variable not found' error is generated. Strings stored in the variable area are preceded by their length, recorded in two bytes in normal Z80 fashion - low byte first. Multisearch can't cope with strings of more than 255 bytes (the code is kept simple! ) so it generates a 'Parameter error' if the most significant byte of either string length is not zero. If all goes well IX is left pointing to the text of S$. From NEXT2 onwards the routine looks for R$. The address of the string (a pointer to the length, in this case) is stored at R_LEN, at the end of a Basic work area called MEMBOT. DE is pointed just before the start of the Basic program (as if the Enter at the end of a previous line had just been reached) and the main loop through the program begins at LINE. At LINE the routine expects the end of a line and the start of a new one. It skips over three bytes - the Enter and line number - and stores a pointer to the line length in L_LEN. We need to know where the line length is recorded since we may need to alter it if we add or delete characters in the line. FIND is the point at which Megasearch [sic] tries to locate the search string. DE is saved, so that we know where the match did (or didn't) occur, and then the loop at MATCH is used to see if the characters from DE onwards match those from IX onwards. Register B contains the length of S$. If the comparison fails before B reaches zero, the program leaps off to GO_ON, but if all goes well, the length of R$ is fetched and compared with that of S$. If the two are the same, execution continues at NO_OK (pronounced 'number OK'!) - otherwise some characters must be inserted or deleted so that the replacement text fits in the line. The job of adding or removing characters is not trivial, since any change in the program size also alters the location of variables, and other useful pieces of information. Luckily, ROM routines exist to adjust the program size and make sure that nothing gets lost. SHRNK and XPAND remove or add BC characters at the location pointed to by HL. XPAND produces an 'Out of memory' error if there's no room for the extra characters. If S$ and R$ are different lengths then Multisearch must adjust the line length (as explained earlier) and alter the pointers to S$ and R$. Any movement of the program also sends the variables skidding around memory, since they're stored at the end of the program. This took a little while to puzzle out when we tested the machine code! A couple of extra jumps are located between the Delete and Insert instructions - the main loop is too long to be traversed in a single relative jump (it can only cross 126 bytes at one mighty bound) so FINDX and LINEX are used as 'staging posts' on the way to FIND and LINE respectively. Various paths meet at NO_OK. At this point a correct match has been found and the address on the stack points to the place where R$ must be stored. An LDIR is used to copy the new text into the program. This leaves DE pointing to the character after the new data, from whence the search can re-start. If S$ didn't match the program we have to advance DE and start again one byte further through the program. This step is performed at GO_ON. Whether or not a match was found, we end up at NEXT, where the Break key is polled in case the user has decided to give up. The routine stops with a BREAK error if bit zero at port address 32766 (the Space key) is reset. At CONT the contents of the system variable VARS are compared with the address in DE. If DE is pointing into the variable area we've finished, and the routine RETurns. Otherwise we must look further through the program, although before that we check for a couple of 'special cases'. If DE points to an 'ENTER' character we've reached the end of a line, so we should pick up the new line length by looping back to LINE. If DE points at a number marker - CHR$ 14 - we must skip over the binary data since it could contain values which appear to be text or keywords, but aren't really. This doesn't stop us finding numbers, since those will always start with an ASCII character (probably a digit). If we've reached the CHR$ 14 we've gone too far. POSSIBLE IMPROVEMENTS There are lots of ways in which Multisearch could be improved, but the existing code works and it doesn't take long to type in! It might be useful to make it return a count of the number of replacements found, and perhaps a list of the lines in which changes were made. It would be convenient (but perhaps rather difficult) to re-code the 'binary form' program in machine code. As it stands, Multisearch is a simple but very effective routine with a multiplicity of uses. There can't be many short routines which can be used to make ZX Basic edit-proof, faster, more concise, more readable, and more versatile. Do let me know what you make of Multisearch.