Chapter 21 - Substrings

Given a string, a substring of it consists of some consecutive characters from it, taken in sequence. Thus "STRING" is a substring of "BIGGER STRING", but "B STRING" & "BIG REG" are not.

There is a notation called slicing for describing substrings, & this can be applied to arbitrary string expressions. The general form is

string expression (start TO finish)

so that, for instance,

"ABCDEF" (2 TO 5) = "BCDE"

If you omit the start, then 1 is assumed; if you omit the finish then the length of the string is assumed. Thus

"ABCDEF" ( TO 5) = "ABCDEF" (1 TO 5) = "ABCDE"

"ABCDEF" (2 TO ) = "ABCDEF" (2 TO 6) = "BCDEF"

&

"ABCDEF" ( TO ) = "ABCDEF" (1 TO 6) = "ABCDEF"

(you can also write this last one as "ABCDEF" (), for what it's worth.)

A slightly different for misses out the TO & just has one number:

"ABCDEF" (3) = "ABCDEF" (3 TO 3) = "C"

Although normally both start & finish must refer to existing parts of the string, this rule is overridden by one other: if the start is more than the finish, then the result is the empty string. So

"ABCDEF" (5 TO 7)

gives error 3 (subscript error) because, the string only contains 6 characters, & 7 is too many, but

"ABCDEF" (8 TO 7) = ""

&

"ABCDEF" (1 TO 0) = ""

The start & finish must not be negative, or you get error B.

This next program makes B\$ equal to A\$, but omitting any trailing spaces.

10 INPUT A\$

20 FOR N=LEN A\$ TO 1 STEP -1

30 IF A\$(N)<>"" THEN GOTO 50

40 NEXT N

50 LET B\$=A\$( TO N)

60 PRINT """";A\$;"""","""";B\$;""""

70 GOTO 10

Note how if A\$ is entirely spaces, then in line 50 we have N = 0 & A\$( TO N) = A\$(1 TO 0) = "".

For string variables, we can not only extract substrings, but also assign to them. For instance type

LET A\$="LOR LOVE A DUCK"

& then

LET A\$(5 TO 8)="******"

&

PRINT A\$

Notice how since the substring A\$(5 TO 8) is only 4 characters long, only the first four stars have been used. This is a characteristic of assigning to substrings: the substring has to be exactly the same length afterwards as it was before. To make sure this happens, the string that is being assigned to it is cut off on the right if it is too long, or filled out with spaces if it is too short - this is called Procrustean assignment after the inn-keeper Procrustes who used to make sure that his guests fitted the bed by either stretching them out on a rack or cutting their feet off.

If you now try

LET A\$()="COR BLIMEY"

&

PRINT A\$;"."

you will see that the same thing has happened again (this time with spaces put in) because A\$() counts as a substring.

LET A\$="COR BLIMEY"

will do it properly

Slicing may be considered as having priority 12, so, for instance

LEN "ABCDEF"(2 TO 5) = LEN("ABCDEF"(2 TO 5)) = 4

Complicated string expressions will need brackets round them before they can be sliced. For example,

"ABC"+"DEF"(1 TO 2) = "ABCDE"

("ABC"+"DEF")(1 TO 2) = "AB"

Summary

Slicing, using TO. Note that this notation is non-standard.

Exercises

1. Some BASICs (not the ZX81 BASIC) have functions called LEFT\$, RIGHT\$, MID\$ & TL\$.

LEFT\$(A\$,N) gives the substring of A\$ consisting of the first N characters.

RIGHT\$(A\$,N) gives the substring of A\$ consisting of the characters from the Nth on.

MID\$(A\$,N1,N2) gives the substring of A\$ consisting of N2 characters starting at the N1th.

TL\$(A\$) gives the substring of A\$ consisting of all its characters except the first.

How would you write these in ZX81 BASIC? Would your answers work with strings of length 0 or 1?

2. Try this sequence of commands:

LET A\$="X*+*Y"

LET A\$(2)=CHR\$ 11  [the string quote character]

LET A\$(4)=CHR\$ 11

PRINT A\$

A\$ is now a string with string quotes inside it! So there is nothing to stop you doing this if you are persevering enough, but clearly if you had originally typed

LET A\$="X"+"Y"

the part to the right of the equals sign would have been treated as an expression, giving A\$ the value "XY".

Now type

LET B\$="X""+""Y"

You will find that although A\$ & B\$ look the same when printed out, they are not equal - try

PRINT A\$=B\$

Whereas B\$ contains mere quote image characters (with code 192), A\$ contains genuine string quote characters (with code 11).

3. Run this program:

10 LET A\$="LEN ""ABDC"""

100 PRINT A\$;" = ";VAL A\$

This will fail because VAL does not treat the quote image "" as a string quote.

Insert some extra lines between 10 & 100 to replace the quote images in A\$ by string quotes (which you must call CHR\$ 11), & try again.

Make the same modifications to the program in chapter 9, exercise 3, & experiment with it.

4. This subroutine deletes every occurrence of the string "CARTHAGO" from A\$.

1000 FOR N=1 TO LEN A\$-7

1020 IF A\$(N TO N+7)="CARTHAGO" THEN LET A\$(N TO N+7)="********"

1030 NEXT N

1040 RETURN

Write a program that gives A\$ various values (e.g. "DELENDA EST CARTHAGO.") & applies the subroutine.

Previous: Chapter 20    Next: Chapter 22