mirror of https://github.com/GNOME/gimp.git
ScriptFu: tests of TinyScheme string-ports and string escapes
Add document about string-like objects in TinyScheme. Add tests of several outstanding issues in TinyScheme. In preparation for fixing them.
This commit is contained in:
parent
5ac6cf6e6c
commit
1e104769d7
|
@ -0,0 +1,288 @@
|
|||
# String-like objects in ScriptFu's TinyScheme
|
||||
|
||||
!!! Work in progress. This documents what should be, and not what is actually implemented. There are bugs in the current implementation.
|
||||
|
||||
## About
|
||||
|
||||
This is a language reference for the string-like features of
|
||||
Script-fu's TinyScheme language.
|
||||
|
||||
This may differ from other Scheme languages. TinyScheme is a subset of R5RS Scheme, and ScriptFu has more than the original TinyScheme because it has been modified to support unichars and bytes. Both string-ports, bytes, and unichar are recent additions to Scheme, and not standardized among Schemes.
|
||||
|
||||
This is not a tutorial, but a technical document intended to be testable.
|
||||
|
||||
Terminology. We use "read method" to denote the function whose name is "read".
|
||||
We occasionally use "read" to mean "one of the functions: read, read-byte, or read-char.
|
||||
|
||||
## The problem of specification
|
||||
|
||||
TinyScheme is a loose subset of the R5RS specification.
|
||||
|
||||
These are not part of R5RS, but optional implementations:
|
||||
- string-ports
|
||||
- unichar
|
||||
- byte operations
|
||||
|
||||
Racket is a Scheme language that also implements the above.
|
||||
See Racket for specifications, examples, and tests,
|
||||
but ScriptFu's TinyScheme may differ.
|
||||
|
||||
SRFI-6 is one specification for string-port behavior.
|
||||
SRFI-6:
|
||||
- has a reference implementation.
|
||||
- does not describe testable behavior.
|
||||
- does not discuss unichar or byte operations
|
||||
|
||||
## Overview
|
||||
|
||||
Script-fu's TinyScheme has these string-like objects:
|
||||
|
||||
- string
|
||||
- string-port
|
||||
|
||||
Both are:
|
||||
|
||||
- sequences of chars
|
||||
- practically infinite
|
||||
- UTF-8 encoded, i.e. the chars are unichars of multiple bytes
|
||||
|
||||
They are related, you can:
|
||||
|
||||
- initialize an input string-port from a string
|
||||
- get a string from an output string-port
|
||||
|
||||
However, the passed string is not owned by a string-port but is a separate object with its own lifetime.
|
||||
|
||||
Differences are:
|
||||
|
||||
- string-ports implement the port API (open, read, write, and close)
|
||||
- string-ports have byte methods but strings do not
|
||||
- strings have a length method but string-ports do not
|
||||
- strings have indexing and substring methods but string-ports do not
|
||||
- write to a string-port is less expensive than an append to a string
|
||||
|
||||
Note that read and write methods to a string-port traffic in objects, not chars.
|
||||
In other words, they serialize or deserialize, representing objects by text, and parsing text into objects.
|
||||
|
||||
Symbols also have string representations, but no string-like methods besides conversion to and from strings.
|
||||
|
||||
## ScriptFu's implementation is in C
|
||||
|
||||
This section does not describe the language but helps to ground the discussion.
|
||||
|
||||
ScriptFu does not use the reference implementation of SRFI-6.
|
||||
The reference implementation is in Scheme.
|
||||
The reference implementation is on top of strings.
|
||||
|
||||
ScriptFu's implementation of *string-ports is not on top of strings.*
|
||||
ScriptFu's implementation is in C language.
|
||||
(The reason is not recorded, but probably for performance.)
|
||||
|
||||
The reference implementation of SRFI-6 requires that READ and WRITE be redefined to use a redefined READ-CHAR etc. that
|
||||
are string-port aware.
|
||||
TinyScheme does something similar: inbyte() and backbyte() dispatch on the kind of port.
|
||||
|
||||
Internally, TinyScheme terminates all strings and string-ports with the NUL character (the 0 byte.) This is not visible in Scheme.
|
||||
|
||||
## Allocations
|
||||
|
||||
A main concern of string-like objects is how they are allocated and their lifetimes.
|
||||
|
||||
All string-like objects are allocated from the heap and with one cell from TinyScheme's cell pool (which is separately allocated from the heap.)
|
||||
|
||||
A string-port and any string used with a string-port are separate objects with separate lifetimes.
|
||||
|
||||
The length of string-like objects is limited by the underlying allocator (malloc) and the OS.
|
||||
|
||||
### String allocations
|
||||
|
||||
Strings and string literals are allocated.
|
||||
|
||||
They are allocated exactly.
|
||||
|
||||
Any append to a string causes a reallocation.
|
||||
|
||||
Any substring of a string causes a new allocation and returns a new string instance.
|
||||
|
||||
String literals are allocated but are immutable.
|
||||
|
||||
### String-port allocations
|
||||
|
||||
String-ports of kind output have an allocated, internal buffer.
|
||||
A buffer has a "reserve" or free space.
|
||||
|
||||
The buffer can sometimes accomodate writes without a reallocation.
|
||||
Writes to an output string-port can be less expensive (higher performing)
|
||||
than appends to a string, which always reallocates.
|
||||
But note that writes are not the same as appending (see below.)
|
||||
|
||||
The write method can write larger than the size that is pre-allocated for the buffer (256 bytes.)
|
||||
|
||||
A string-port of kind input is not a buffer.
|
||||
It is allocated once.
|
||||
It's size is fixed when opened.
|
||||
|
||||
## Byte, char, and object methods
|
||||
|
||||
### String methods
|
||||
|
||||
Strings are composed of characters.
|
||||
The method string-ref accesses a character component.
|
||||
|
||||
Strings have no byte methods.
|
||||
Characters can be converted to integers and then to bytes.
|
||||
See "Support for byte type" at the Gimp developer web site.
|
||||
|
||||
Strings have no object methods: read and write.
|
||||
|
||||
### Port methods
|
||||
|
||||
Ports, and thus string-ports, have byte and char methods:
|
||||
|
||||
read-byte, read-char
|
||||
write-byte, write-char
|
||||
|
||||
Ports also have methods trafficing in objects:
|
||||
|
||||
read, write
|
||||
|
||||
|
||||
### Methods and port kinds
|
||||
|
||||
String-ports are of two kinds:
|
||||
- input
|
||||
- output
|
||||
|
||||
*There is also an input-output kind of string-port in TinyScheme,
|
||||
but this document does not describe it and any use of it is not supported in ScriptFu*
|
||||
|
||||
You should only use the "read" methods on a string-port of kind input.
|
||||
You should only use the "write" methods on a string-port of kind output.
|
||||
A call of a read method on a string-port of kind output returns an error, and vice versa.
|
||||
|
||||
|
||||
### Mixing byte, char, and object methods
|
||||
|
||||
*You should not mix byte methods with char methods, unless you are careful.*
|
||||
You must understand UTF-8 encoding to do so.
|
||||
|
||||
*You should not mix char methods with read/write methods, unless you are careful.*
|
||||
You must understand parsing and representation of Scheme to do so.
|
||||
|
||||
### The NUL character and byte
|
||||
|
||||
Internally, TinyScheme terminates all strings and string-ports
|
||||
with the NUL character (the 0 byte.)
|
||||
|
||||
*You should not write the NUL character to strings or string-ports. The result can be surprising, and is not described here.*
|
||||
You must understand the role of NUL bytes in C strings.
|
||||
|
||||
You cannot read the NUL character or byte from a string or string-port since the interpreter always sees it as a terminator.
|
||||
|
||||
Note that a string escape sequence for NUL, which is itself a string without a NUL character, can be inside a string or string-port.
|
||||
|
||||
You can read and write the NUL character to file ports that you are treating as binary files and not text files.
|
||||
|
||||
## Length methods
|
||||
|
||||
### Strings
|
||||
|
||||
The length of a string is in units of characters. Remember that each character in UTF-8 encoding may comprise many bytes.
|
||||
|
||||
(string-length "foo") => 3
|
||||
|
||||
### Ports
|
||||
|
||||
Ports have no methods for obtaining the length, either in characters or byte units. Some other Schemes have such methods.
|
||||
|
||||
## String-port and initial strings
|
||||
|
||||
### Input ports
|
||||
|
||||
The method open-input-port takes an initial string.
|
||||
|
||||
The initial string can be large, limited only by malloc.
|
||||
|
||||
TinyScheme copies the initial string to the port.
|
||||
Subsequently, these have no effect on the port:
|
||||
|
||||
- the initial string going out of scope
|
||||
- an append to the initial string
|
||||
|
||||
Subsequent calls to read methods progress along the port contents,
|
||||
until finally EOF is returned.
|
||||
|
||||
The initial string can be the empty string and then the first read will return the EOF object.
|
||||
|
||||
There are no methods for appending to an input string-port after it is opened.
|
||||
|
||||
### Output ports
|
||||
|
||||
*The method open-output-port optionally takes an initial string but it is ignored.*
|
||||
|
||||
In other Schemes, any initial string is the name of the port.
|
||||
|
||||
In version 2 of Script-Fu's TinyScheme, the initial string was loosely speaking the buffer for the string-port.
|
||||
This document does not describe the version 2 behavior.
|
||||
|
||||
An output string-port is initially empty and not the initial string.
|
||||
|
||||
The initial string may go out of scope without effect on an output string-port.
|
||||
|
||||
You can write more to an output port than the length of the initial string.
|
||||
|
||||
## The get-output-string method
|
||||
|
||||
A string-port of kind output is a stream that can be written to
|
||||
but that can be read only by getting its entire contents.
|
||||
|
||||
(get-output-string port)
|
||||
|
||||
Returns a string that is the accumulation of all prior writes to the output string-port. This is a loose definition. Chars, bytes, and objects can be written, and it is the representation of objects that accumulate.
|
||||
|
||||
The port must be open at the time of the call.
|
||||
|
||||
A get-output-string call on a newly opened empty port returns the empty string.
|
||||
|
||||
Consecutive calls to get-output-string return two different string objects, but they are equivalent.
|
||||
|
||||
The returned string is a distinct object from the port.
|
||||
These subsequent events have no effect on the returned string:
|
||||
|
||||
- writes to the port
|
||||
- closing the port
|
||||
- the port subsequently going out of scope
|
||||
|
||||
Again, *you should not mix write-byte, write-char, and write to an output string-port, without care*
|
||||
|
||||
|
||||
## Writing and reading strings to a string-port
|
||||
|
||||
These are different:
|
||||
- write a string to an output string-port
|
||||
- append a string to a string
|
||||
|
||||
A string written to an output string-port writes escaped quotes into the string-port:
|
||||
|
||||
```
|
||||
> (define aPort (open-output-string))
|
||||
> (write "foo" aPort)
|
||||
> (get-output-string aPort)
|
||||
"\"foo\""
|
||||
```
|
||||
|
||||
That is, writing a string to an output string-port
|
||||
writes seven characters, three for foo and two pairs of
|
||||
backslash quote.
|
||||
```
|
||||
\"foo\"
|
||||
1234567
|
||||
```
|
||||
This is a string, which in the REPL prints surrounded by quotes:
|
||||
```
|
||||
"\"foo\""
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
|
@ -7,6 +7,8 @@ if not stable
|
|||
]
|
||||
|
||||
test_scripts = [
|
||||
# test binding to the PDB
|
||||
|
||||
'tests' / 'PDB' / 'image' / 'image-new.scm',
|
||||
'tests' / 'PDB' / 'image' / 'image-precision.scm',
|
||||
'tests' / 'PDB' / 'image' / 'image-indexed.scm',
|
||||
|
@ -76,6 +78,7 @@ if not stable
|
|||
# comprehensive, total test
|
||||
'tests' / 'PDB' / 'pdb.scm',
|
||||
|
||||
# test TinyScheme embedded interpreter
|
||||
|
||||
'tests' / 'TS' / 'sharp-expr.scm',
|
||||
'tests' / 'TS' / 'sharp-expr-char.scm',
|
||||
|
@ -84,7 +87,10 @@ if not stable
|
|||
'tests' / 'TS' / 'cond-expand.scm',
|
||||
'tests' / 'TS' / 'atom2string.scm',
|
||||
'tests' / 'TS' / 'integer2char.scm',
|
||||
'tests' / 'TS' / 'string-port.scm',
|
||||
'tests' / 'TS' / 'string-escape.scm',
|
||||
'tests' / 'TS' / 'string-port-input.scm',
|
||||
'tests' / 'TS' / 'string-port-output.scm',
|
||||
# WIP 'tests' / 'TS' / 'string-port-unichar.scm',
|
||||
'tests' / 'TS' / 'testing.scm',
|
||||
'tests' / 'TS' / 'vector.scm',
|
||||
'tests' / 'TS' / 'no-memory.scm',
|
||||
|
|
|
@ -46,6 +46,19 @@
|
|||
(assert '(equal? (integer->char 13) #\return))
|
||||
(assert '(equal? (integer->char 32) #\space))
|
||||
|
||||
; Mispelled sharp constants
|
||||
; Any sequence of chars starting with #\n, up to a delimiter,
|
||||
; that does not match "newline"
|
||||
; is parsed as the sharp constant for the lower case ASCII n char.\
|
||||
; Similarly for tab, return, space
|
||||
(test! "mispelled sharp char constant for newline")
|
||||
; 110 is the codepoint for lower case n
|
||||
(assert '(equal? (integer->char 110) #\n))
|
||||
(assert '(equal? (integer->char 110) #\newlin))
|
||||
(assert '(equal? (integer->char 110) #\newlines))
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
; sharp constant character
|
||||
|
@ -84,7 +97,7 @@
|
|||
; by a sharp constant hex e.g. #\x1f for 31
|
||||
|
||||
|
||||
; Edge codepoint tests
|
||||
(test! "Edge codepoint tests")
|
||||
; Tests of edge cases, near a code slightly different
|
||||
|
||||
; Codepoint US Unit Separator, edge case to 32, space
|
||||
|
@ -153,7 +166,7 @@
|
|||
#\x0))
|
||||
|
||||
|
||||
; sharp constants for delimiter characters
|
||||
(test! "sharp constants for delimiter characters")
|
||||
|
||||
; These test the sharp constant notation for characters space and parens
|
||||
; These are in the ASCII range
|
||||
|
|
|
@ -0,0 +1,204 @@
|
|||
; test string escape sequences
|
||||
|
||||
; An "escape sequence" is a sequence of characters that,
|
||||
; when parsing a string, yields a single character.
|
||||
; All escape sequences start with the backslash.
|
||||
|
||||
; TS is unicode: lengths are in unichars, not bytes
|
||||
|
||||
; 0xff is the C language notation for a hex constant
|
||||
|
||||
; Many tests are lax using string-length
|
||||
|
||||
|
||||
|
||||
; We can't test certain errors since they terminate
|
||||
; - Doublequote without trailing doublequote
|
||||
; - buffer overflow
|
||||
; - short hex escapes (<2 hex digits)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
(test! "escaped doublequote")
|
||||
(assert `(= (string-length "\"") 1))
|
||||
|
||||
; escaped newline, tab, carriage return
|
||||
(assert `(= (string-length "\n") 1))
|
||||
(assert `(= (string-length "\t") 1))
|
||||
(assert `(= (string-length "\r") 1))
|
||||
|
||||
(test! "escaped backslash")
|
||||
; escaped backslash, stands for itself
|
||||
(assert `(= (string-length "\\") 1))
|
||||
|
||||
(test! "escaped other chars, ASCII")
|
||||
; any other escaped char, that is not an octal digit, stands for itself
|
||||
(assert `(= (string-length "\a") 1))
|
||||
|
||||
(test! "escaped other chars, unichar")
|
||||
(assert `(= (string-length "\λ") 1))
|
||||
|
||||
|
||||
; !!! Note that readable sequences for sharp constants for control chars
|
||||
; are not suitable in strings.
|
||||
; #\tab is not a sharp constant expression, and \tab is not a string escape
|
||||
(assert `(= (string-length "\tab") 3))
|
||||
|
||||
|
||||
|
||||
; octal escape sequences
|
||||
; FUTURE obsolete these: we don't need to support both hex and octal.
|
||||
|
||||
(test! "octal escapes")
|
||||
|
||||
(test! "octal NUL")
|
||||
; one digit octal sequence
|
||||
; NUL character, a zero byte, yields a string, but empty
|
||||
(assert `(= (string-length "\0") 0))
|
||||
|
||||
; two digit octal sequence
|
||||
; 0o11 is tab
|
||||
(assert `(string=? "\11" "\t"))
|
||||
|
||||
(test! "octal escaped characters match non-escaped ASCII characters")
|
||||
; A is 65 is 0o101
|
||||
(assert `(string=? "\101" "A"))
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
; Three digit octal sequences that don't fit in a byte.
|
||||
; Comments in the code says it should yield an error.
|
||||
; So < 255, which is 0o377, should work.
|
||||
|
||||
(test! "octal 377")
|
||||
; Yields a sequence of bytes that is not proper UTF-8 encoded, string length 0
|
||||
(assert `(= (string-length "\377") 0))
|
||||
|
||||
|
||||
; (test! "octal 400 yields error")
|
||||
; In v2 the max value is 255, that fits in a byte.
|
||||
; FIXME: the code comments says 0x400 should yield an error
|
||||
;(assert-error `(string-length "\400")
|
||||
; "Error: Error reading string")
|
||||
|
||||
; !!! But in UTF-8 0x377==255 is encoded in two bytes
|
||||
; and yields LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
; !!! length in chars is 1, length in bytes is 2.
|
||||
; FUTURE (assert `(= (string-length "\377") 1))
|
||||
|
||||
; FUTURE: if we don't obsolete octal escapes altogether,
|
||||
; then three or four octal digits should be allowed.
|
||||
;(test! "octal 777")
|
||||
; 0o777 is 0x1ff
|
||||
; 1 char, encoded as 2 bytes.
|
||||
; (assert `(= (string-length "\777") 1))
|
||||
; TODO test the string is two-bytes
|
||||
; we don't have string-length-bytes function
|
||||
|
||||
; four octal digits yields two char and three bytes.
|
||||
; (assert `(= (string-length "\3777") 2))
|
||||
; TODO test the second char is '7'
|
||||
|
||||
|
||||
|
||||
|
||||
(test! "hex escapes")
|
||||
|
||||
(test! "hex NUL")
|
||||
; NUL character, a zero byte, yields a string, but empty
|
||||
(assert `(= (string-length "\x0000") 0))
|
||||
|
||||
;(test! "short hex escape")
|
||||
; TODO Can't be tested, aborts interpreter, parsing fails
|
||||
; maybe wrapping it in a string-port
|
||||
; require at least two hex digits
|
||||
;(assert-error `(string-length "\x")
|
||||
; "Error: Error reading string")
|
||||
;(assert-error `(string-length "\x0")
|
||||
; "Error: Error reading string")
|
||||
|
||||
(test! "2 digit hex escape, ASCII")
|
||||
; yields A
|
||||
(assert `(= (string-length "\x41") 1))
|
||||
|
||||
(test! "2 digit hex escape, non-ASCII > 127")
|
||||
; FIXME, fails string length 0
|
||||
; See scheme.c line 1957 *p++=c is pushing one byte
|
||||
;
|
||||
; Yields LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
; Yields one character of two UTF-8 bytes.
|
||||
(assert `(= (string-length "\xff") 1))
|
||||
|
||||
; Uppercase \XFF also accepted
|
||||
; yields LATIN SMALL LETTER Y WITH DIAERESIS
|
||||
;(assert `(= (string-length "\XFF") 1))
|
||||
|
||||
|
||||
(test! "3 digit hex escape x414 yields two characters")
|
||||
; This is the current behavior.
|
||||
; SF parses only two hex digits as part of the hex escape,
|
||||
; and the third hex digit is parsed as itself.
|
||||
; FUTURE parse a max of four bytes of hex, like say Racket
|
||||
; yields A4
|
||||
(assert `(= (string-length "\x414") 2))
|
||||
|
||||
; FUTURE: Now does not accept
|
||||
;(test! "3 digit hex escape")
|
||||
; yields one unnamed char, 3 bytes
|
||||
;(assert `(= (string-length "\xfff") 1))
|
||||
|
||||
;(test! "4 digit hex escape")
|
||||
; yields unnamed char, 3 bytes
|
||||
;(assert `(= (string-length "\xffff") 1))
|
||||
|
||||
;(test! "5 digit hex escape")
|
||||
; yields 2 chars, the unnamed char, 3 bytes,
|
||||
; and the char LOWER CASE F, 1 byte
|
||||
;(assert `(= (string-length "\xfffff") 1))
|
||||
|
||||
|
||||
|
||||
; Every four digit hex value is a valid codepoint
|
||||
; meaning it will encode in UTF-8.
|
||||
; Whether it displays a visible glyph depends on other factors.
|
||||
|
||||
|
||||
|
||||
|
||||
(test! "consecutive escape sequences")
|
||||
|
||||
|
||||
(test! "consecutive hex escapes")
|
||||
; two A chars
|
||||
(assert `(= (string-length "\x41\x41") 2))
|
||||
|
||||
; FIXME fails
|
||||
;(test! "consecutive hex escapes")
|
||||
; two CENT chars
|
||||
;(assert `(= (string-length "\xa2\xa2") 2))
|
||||
|
||||
; FIXME fails
|
||||
; (test! "consecutive octal escapes")
|
||||
; two CENT chars
|
||||
; (assert `(= (string-length "\242\242") 2))
|
||||
|
||||
(test! "consecutive escaped backslash and hex escape")
|
||||
; yields 3 characters: BACKSLASH, A, BACKSLASH,
|
||||
(assert `(= (string-length "\\\x41\\") 3))
|
||||
|
||||
; FIXME fails
|
||||
;(test! "consecutive escaped backslash and hex escape")
|
||||
; yields 3 characters: BACKSLASH, CENT, BACKSLASH,
|
||||
;(assert `(= (string-length "\\\xa2\\") 3))
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,212 @@
|
|||
; Test cases for string ports of kind input
|
||||
|
||||
; See general discussion of string ports at string-port-output.scm
|
||||
|
||||
|
||||
; You read objects from a string.
|
||||
; A read has the side effect of advancing a cursor in the string.
|
||||
|
||||
; read-byte is discouraged on an output string-port.
|
||||
; Complicated by fact that strings are UTF-8 encoded.
|
||||
|
||||
; read-char is a method also
|
||||
|
||||
; !!! The input port object does not own a string object.
|
||||
; The "string" internally is a C pointer to a Scheme cell for a Scheme string.
|
||||
; The port does not have a cell referring to the cell for the string.
|
||||
; It does NOT survive garbage collection.
|
||||
; Closing a port frees memory to C, but few cells to Scheme.
|
||||
|
||||
; Closing a port leaves the symbol defined until it goes out of scope,
|
||||
; but the symbol no longer is bound to a port object
|
||||
; i.e. operations on it fail.
|
||||
|
||||
|
||||
|
||||
|
||||
; setup
|
||||
; Some tests use new ports, not the setup one.
|
||||
|
||||
; Note initial contents is a sequence of alphabetic chars,
|
||||
; which reads as one symbol object.
|
||||
(define aStringPort (open-input-string "foo"))
|
||||
|
||||
|
||||
|
||||
; tests
|
||||
|
||||
(test! "open-input-string yields a port")
|
||||
(assert `(port? ,aStringPort))
|
||||
|
||||
(test! "open-input-string yields a port of kind input")
|
||||
(assert `(input-port? ,aStringPort))
|
||||
|
||||
(test! "open-input-string yields a port NOT of kind output")
|
||||
(assert `(not (output-port? ,aStringPort)))
|
||||
|
||||
(test! "write always fails on an input string-port")
|
||||
(assert-error `(write "bar" ,aStringPort)
|
||||
"write: argument 2 must be: output port")
|
||||
|
||||
(test! "write-char always fails on an input string-port")
|
||||
(assert-error `(write-char #\a ,aStringPort)
|
||||
"write-char: argument 2 must be: output port")
|
||||
|
||||
(test! "write-byte always fails on an input string-port")
|
||||
(assert-error `(write-byte (integer->byte 72),aStringPort)
|
||||
"write-byte: argument 2 must be: output port")
|
||||
|
||||
(test! "get-output-string always fails on an input string-port")
|
||||
(assert-error `(get-output-string ,aStringPort)
|
||||
"get-output-string: argument 1 must be: output port")
|
||||
|
||||
|
||||
|
||||
; read
|
||||
|
||||
; refresh the port
|
||||
(define aStringPort (open-input-string "foo"))
|
||||
|
||||
(test! "string read from input-string equals initial contents of port, one symbol")
|
||||
; yields a symbol whose repr is "foo"
|
||||
; ??? This seems to fail sometimes, possibly due to gc, see below?
|
||||
(assert `(string=?
|
||||
(symbol->string (read ,aStringPort))
|
||||
"foo"))
|
||||
|
||||
(test! "next read from input-string equals EOF")
|
||||
(assert `(eof-object? (read ,aStringPort)))
|
||||
|
||||
; Note now the port is empty and for testing we must make another
|
||||
|
||||
|
||||
|
||||
|
||||
; port with unichar contents
|
||||
|
||||
(define aStringPort (open-input-string "λ"))
|
||||
|
||||
; FIXME issue #11040
|
||||
; This is now returning EOF where it should return a unichar char as a symbol
|
||||
(test! "read from input-string with unichar content equals that unichar as symbol")
|
||||
; yields a symbol whose repr is "λ"
|
||||
(assert `(string=?
|
||||
(symbol->string (read ,aStringPort))
|
||||
"λ"))
|
||||
|
||||
|
||||
|
||||
; port with escape sequence for NUL char
|
||||
(define aStringPort (open-input-string "a\x00b"))
|
||||
|
||||
(test! "read from input-string with escape sequence for NUL is truncated")
|
||||
; yields a symbol whose repr is "a"
|
||||
(assert `(string=?
|
||||
(symbol->string (read ,aStringPort))
|
||||
"a"))
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
; read multiple objects
|
||||
; TODO
|
||||
|
||||
|
||||
; garbage collection
|
||||
|
||||
(define aStringPort (open-input-string "foo"))
|
||||
; using aStringPort whose contents read as a symbol "foo"
|
||||
|
||||
(test! "input string-port with literal contents MAY NOT survive garbage collection")
|
||||
; !!! We wrote "foo" but assert that "foo" is NOT THE CONTENTS
|
||||
; This test corrupts the port.
|
||||
; This test is of a random result and may fail.
|
||||
; After gc, a C pointer of the port implementation
|
||||
; is pointing to the garbage collected string,
|
||||
; some memory whose contents are undefined.
|
||||
; Usually a symbol is returned and it is not "foo".
|
||||
; But it could still be "foo".
|
||||
(assert `(not
|
||||
(string=?
|
||||
(symbol->string
|
||||
(begin
|
||||
(gc)
|
||||
(read ,aStringPort)))
|
||||
"foo")))
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
; read-char and read-byte
|
||||
|
||||
|
||||
(define aStringPort (open-input-string "foo"))
|
||||
|
||||
(test! "read-char on input string-port, ASCII")
|
||||
; read-char works, but discouraged from mixing with read
|
||||
; since read parses a Scheme object, and the read char might
|
||||
; be syntax.
|
||||
(assert `(equal?
|
||||
(read-char ,aStringPort)
|
||||
#\f ))
|
||||
|
||||
|
||||
(define aStringPort (open-input-string "λ"))
|
||||
|
||||
; FIXME fails for same reason as above
|
||||
(test! "read-char on input string-port, unichar")
|
||||
(assert `(equal?
|
||||
(read-char ,aStringPort)
|
||||
#\λ ))
|
||||
|
||||
|
||||
; Example code for getting char from byte from read-byte
|
||||
; (integer->char (byte->integer (read-byte port)))
|
||||
|
||||
|
||||
|
||||
; read-byte
|
||||
;
|
||||
; read-byte should not be mixed with read-char or read, without care.
|
||||
|
||||
(define aStringPort (open-input-string "foo"))
|
||||
|
||||
(test! "read-byte to EOF on input-string, ASCII chars")
|
||||
; The first byte is the single byte UTF-8 encoding of f char,
|
||||
; then two o chars, then EOF
|
||||
(assert `(eof-object?
|
||||
(begin
|
||||
(read-byte ,aStringPort)
|
||||
(read-byte ,aStringPort)
|
||||
(read-byte ,aStringPort)
|
||||
(read-byte ,aStringPort))))
|
||||
|
||||
|
||||
(define aStringPort (open-input-string "λa"))
|
||||
|
||||
; FIXME fails for same reason as above
|
||||
(test! "read-byte then read-char on input-string, two-byte UTF-8 encoded char")
|
||||
; The first byte of the lambda char is 0xce 206, the next 0xbb 187, code point is 0x3bb
|
||||
; Expect this leaves the port in condtion for a subsequent read-char or read
|
||||
(assert `(= (byte->integer (read-byte ,aStringPort))
|
||||
206))
|
||||
(assert `(= (byte->integer (read-byte ,aStringPort))
|
||||
187))
|
||||
(assert `(equal? (read-char ,aStringPort)
|
||||
#\a))
|
||||
|
||||
|
||||
|
||||
; closing
|
||||
|
||||
(define aStringPort (open-input-string "foo"))
|
||||
|
||||
(test! "closing a port")
|
||||
(assert `(close-port ,aStringPort))
|
||||
|
||||
(test! "a closed port cannot be read")
|
||||
(assert-error `(read ,aStringPort)
|
||||
"read: argument 1 must be: input port")
|
||||
|
|
@ -0,0 +1,243 @@
|
|||
; Test cases for string ports of kind output
|
||||
|
||||
; A port has a bifurcated API:
|
||||
; input API
|
||||
; output API.
|
||||
; Some ports support both.
|
||||
; The input API has write, but not read method.
|
||||
|
||||
; Some ports support byte and char operations.
|
||||
|
||||
; A string port is-a port.
|
||||
|
||||
; !!! write and read methods take or return Scheme objects
|
||||
; i.e. strings, symbols, atoms, etc.
|
||||
|
||||
; A string port is of kind: input or output.
|
||||
; A string port does not have all the methods of the port API:
|
||||
; kind output has write method, but not read.
|
||||
; kind input has read method, but not write.
|
||||
|
||||
; A string output port stores its contents in memory (unlike device ports).
|
||||
; A get-output-string returns contents previously written.
|
||||
; A string port is practically infinite.
|
||||
|
||||
; A string port is like a string.
|
||||
; A sequence of writes are like a sequence of appends to a string,
|
||||
; except the things written are objects, not just strings.
|
||||
|
||||
; You can only get the entire string.
|
||||
; A get does not have the side effect of advancing a cursor in the string.
|
||||
|
||||
; write-byte is discouraged on an output string-port.
|
||||
; Complicated by fact that strings are UTF-8 encoded.
|
||||
|
||||
; !!! The port object does not own a string object.
|
||||
; The "string" internally is in a UTF-8 encoded C allocated chunk
|
||||
; of memory, but not in a Scheme cell for a Scheme string.
|
||||
; It survives garbage collection.
|
||||
; Closing a port frees memory to C, but few cells to Scheme.
|
||||
|
||||
; Closing a port leaves the symbol defined until it goes out of scope,
|
||||
; but the symbol no longer is bound to a port object
|
||||
; i.e. operations on it fail.
|
||||
|
||||
|
||||
|
||||
|
||||
; setup
|
||||
; Some tests use new ports, not the setup one.
|
||||
|
||||
; This port is unlimited, should grow
|
||||
(define aStringPort (open-output-string))
|
||||
|
||||
|
||||
|
||||
; tests
|
||||
|
||||
(test! "open-output-string yields a port")
|
||||
(assert `(port? ,aStringPort))
|
||||
|
||||
(test! "open-output-string yields a port of kind output")
|
||||
(assert `(output-port? ,aStringPort))
|
||||
|
||||
(test! "open-output-string yields a port NOT of kind input")
|
||||
(assert `(not (input-port? ,aStringPort)))
|
||||
|
||||
(test! "read method fails on an output string-port")
|
||||
(assert-error `(read ,aStringPort)
|
||||
"read: argument 1 must be: input port")
|
||||
|
||||
(test! "read-byte method fails on an output string-port")
|
||||
(assert-error `(read-byte ,aStringPort)
|
||||
"read-byte: argument 1 must be: input port")
|
||||
|
||||
|
||||
|
||||
|
||||
(test! "string get from port equals string write to port")
|
||||
; !!! with escaped double quote
|
||||
(assert `(string=?
|
||||
(let* ((aStringPort (open-output-string)))
|
||||
(write "foo" aStringPort)
|
||||
(get-output-string aStringPort))
|
||||
"\"foo\""))
|
||||
|
||||
(test! "string get from port equals string repr of symbol written to port")
|
||||
; !!! without escaped double quote
|
||||
(assert `(string=?
|
||||
(let* ((aStringPort (open-output-string)))
|
||||
; !!! 'foo is-a symbol whose repr is three characters: foo
|
||||
; write to a port writes the repr
|
||||
(write 'foo aStringPort)
|
||||
(get-output-string aStringPort))
|
||||
(symbol->string 'foo)))
|
||||
|
||||
(test! "get-output-string called twice returns the same string")
|
||||
; Can get the same string twice
|
||||
(assert `(string=?
|
||||
(begin
|
||||
(write "foo" ,aStringPort)
|
||||
(get-output-string ,aStringPort)
|
||||
(get-output-string ,aStringPort))
|
||||
"\"foo\""))
|
||||
|
||||
(test! "port contents survive garbage collection")
|
||||
; using aStringPort whose contents are "foo"
|
||||
(assert `(string=?
|
||||
(begin
|
||||
(gc)
|
||||
(get-output-string ,aStringPort))
|
||||
"\"foo\""))
|
||||
|
||||
|
||||
|
||||
; tests of the form (open-output-string <initial string>)
|
||||
|
||||
; Some Schemes have an optional argument a string that is the initial contents?
|
||||
; Guile does not. Racket does not, but takes a name for the port. MIT does not.
|
||||
;
|
||||
; The initial string is always overwritten, and is just an allocation.
|
||||
; Only the size of the initial string matters, not the contents.
|
||||
; Also, see test9.scm, which tests this using a string whose scope is larger
|
||||
; and so does not get garbage collected.
|
||||
|
||||
(define aLimitedStringPort (open-output-string "initial"))
|
||||
|
||||
(test! "initial contents string is just an allocation")
|
||||
; !!! only 7 bytes are allocated.
|
||||
; get-output-string returns empty string, not the initial contents.
|
||||
(assert `(string=? (get-output-string ,aLimitedStringPort)
|
||||
""))
|
||||
|
||||
(test! "writing to output string-port w initial contents may truncate")
|
||||
; Only 7 chars are written, and a double quote char takes one
|
||||
(assert `(string=?
|
||||
(begin
|
||||
(write "INITIALPLUS" ,aLimitedStringPort)
|
||||
(get-output-string ,aLimitedStringPort))
|
||||
"\"INITIA"))
|
||||
|
||||
;(test! "port contents survive garbage collection")
|
||||
; using aStringPort whose contents are "INITIAL"
|
||||
; TODO this may be crashing
|
||||
;(assert `(string=?
|
||||
; (begin
|
||||
; TODO need to open a port and write to it
|
||||
; (gc)
|
||||
; (get-output-string ,aLimitedStringPort))
|
||||
; "\"INITIAL\""))
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
; write bytes
|
||||
|
||||
; initial contents "foo"
|
||||
|
||||
(test! "write-byte on output-string, ASCII char")
|
||||
(assert `(write-byte (integer->byte 72) ,aStringPort))
|
||||
; write is effective when byte is an ASCII char, valid UTF-8 encoding
|
||||
; Note that the yield is a repr of a string followed by a repr of a char.
|
||||
(assert `(string=? (get-output-string ,aStringPort)
|
||||
"\"foo\"H"))
|
||||
|
||||
; This test corrupts aStringPort.
|
||||
; It tests what an author should NOT do: write a byte that is not UTF-8 encoding.
|
||||
(test! "write-byte on output-string, non ASCII char")
|
||||
(assert `(write-byte (integer->byte 172) ,aStringPort))
|
||||
; write yields strange results when byte is not a proper UTF-8 encoding.
|
||||
; Note that the yield is same as before, and doesn't show the written byte.
|
||||
(assert `(string=? (get-output-string ,aStringPort)
|
||||
"\"foo\"H"))
|
||||
|
||||
|
||||
|
||||
; closing
|
||||
|
||||
(test! "closing a port")
|
||||
(assert `(close-port ,aStringPort))
|
||||
|
||||
(test! "a closed port cannot be get-output-string")
|
||||
(assert-error `(get-output-string ,aStringPort)
|
||||
"get-output-string: argument 1 must be: output port")
|
||||
|
||||
(test! "a closed port cannot be written")
|
||||
(assert-error `(write 'foo ,aStringPort)
|
||||
"write: argument 2 must be: output port ")
|
||||
|
||||
|
||||
; closing not affect prior gotten contents
|
||||
(test! "closing output port not affect prior gotten contents")
|
||||
; setup
|
||||
(define aStringPort (open-output-string))
|
||||
(write "foo" aStringPort)
|
||||
(define contents (get-output-string aStringPort))
|
||||
(close-port aStringPort)
|
||||
(gc)
|
||||
(assert `(string=? ,contents
|
||||
"\"foo\""))
|
||||
|
||||
|
||||
|
||||
|
||||
; What is read equals the string written.
|
||||
; Edge case: writing more than 256 characters in two tranches
|
||||
; where second write crosses end boundary of 256 char buffer.
|
||||
|
||||
; issue #9495
|
||||
; Failing
|
||||
;(assert '(string=?
|
||||
; (let* ((aStringPort (open-output-string)))
|
||||
; (write (string->symbol (make-string 250 #\A)) aStringPort)
|
||||
; (write (string->symbol (make-string 7 #\B)) aStringPort)
|
||||
; (get-output-string aStringPort))
|
||||
; (string-append
|
||||
; (make-string 250 #\A)
|
||||
; (make-string 7 #\B))))
|
||||
|
||||
|
||||
|
||||
|
||||
; read/write are opposites
|
||||
|
||||
; !!! Note in this case lack of escaped quotes on what is read
|
||||
|
||||
; FIXME, this fails
|
||||
(test! "read's of a get-output-string return what was write'd before")
|
||||
; setup
|
||||
(define aOutStringPort (open-output-string))
|
||||
(write "foo" aOutStringPort)
|
||||
(write "bar" aOutStringPort)
|
||||
(define aInStringPort (open-input-string (get-output-string aOutStringPort)))
|
||||
(close-port aOutStringPort)
|
||||
(gc)
|
||||
; aInStringPort is open having contents "\"foo\"\"bar\""
|
||||
; test the original strings can be read consecutively
|
||||
(assert `(string=? (read ,aInStringPort)
|
||||
"foo"))
|
||||
(assert `(string=? (read ,aInStringPort)
|
||||
"bar"))
|
||||
|
||||
|
|
@ -0,0 +1,108 @@
|
|||
; test string ports with unichar
|
||||
|
||||
; WIP: requires changes to string escapes: four hex digits
|
||||
; Not currently in the test suite
|
||||
|
||||
; This tests unichars in strings work nicely with string-ports.
|
||||
; Algebraic tests combining two different areas of the code.
|
||||
; The concern is that bytes and chars are counted correctly.
|
||||
|
||||
; Also tests hex escape sequences, which are required to express
|
||||
; edge case unichars otherwise not possible to put in a text.
|
||||
|
||||
; The edge cases of unichar:
|
||||
; - simple non-ASCII unichar character named LAMBDA
|
||||
; - edge of UTF-8 U+FFFF
|
||||
; - edge of NUL-terminated strings: ASCII character named NUL
|
||||
|
||||
|
||||
; TODO testing of string-ports in string-port.scm could contain this
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
; input-string
|
||||
|
||||
(test! "string containing unichar as symbol parses")
|
||||
|
||||
; open-input-string yields string without double quotes
|
||||
; read from that string is-a list containing a symbol
|
||||
(assert `(list? (read (open-input-string "'(λ)"))))
|
||||
; car of that list is-a symbol
|
||||
(assert `(symbol? (car (read (open-input-string "'(λ)")))))
|
||||
|
||||
|
||||
(test! "string containing unichar as character parses into symbol")
|
||||
; \x3bb is the hex escaped for the LAMBDA character
|
||||
; but it is in the input string as a symbol
|
||||
(assert `(symbol? (read (open-input-string "\x3bb"))))
|
||||
|
||||
(test! "string containing string embedding a unichar parses into string")
|
||||
; string inside string via escaped double quotes
|
||||
(assert `(string? (read (open-input-string "\"\x3bb\""))))
|
||||
|
||||
(test! "string containing string embedding a unprintable unichar parses into string")
|
||||
; string inside string via escaped double quotes
|
||||
(assert `(string? (read (open-input-string "\"\xFFFF\""))))
|
||||
|
||||
|
||||
; output-string
|
||||
|
||||
; open-output-string takes an optional "initial contents string"
|
||||
|
||||
(test! "get-output-string from output port equals string written to port")
|
||||
(test! "simple LAMBDA")
|
||||
; !!! with escaped double quote
|
||||
(assert '(string=?
|
||||
(let* ((aStringPort (open-output-string)))
|
||||
(write "λ" aStringPort)
|
||||
(get-output-string aStringPort))
|
||||
"\"λ\""))
|
||||
|
||||
(test! "U+FFFF")
|
||||
(assert '(string=?
|
||||
(let* ((aStringPort (open-output-string)))
|
||||
(write "\xFFFF" aStringPort)
|
||||
(get-output-string aStringPort))
|
||||
"\"\xFFFF\""))
|
||||
|
||||
(test! "writing embedded NUL to output port shortens the string")
|
||||
; !!! NUL written as \x0000 since \x00af is a hex escape sequence
|
||||
|
||||
; I don't think this leaks memory, the internal C string
|
||||
; is correctly managed if the code always uses gunichar functions
|
||||
; to calculate the internal strlength in unichars ?
|
||||
|
||||
(assert '(string=?
|
||||
(let* ((aStringPort (open-output-string)))
|
||||
(write "before\x0000after" aStringPort)
|
||||
(get-output-string aStringPort))
|
||||
"\"before\""))
|
||||
|
||||
(test! "initial contents to a new output port, with unichar")
|
||||
(assert '(string=?
|
||||
(let* ((aStringPort (open-output-string "λ")))
|
||||
(get-output-string aStringPort))
|
||||
"λ"))
|
||||
|
||||
|
||||
; input-output-string
|
||||
|
||||
; open-input-output-string is non-standard Scheme, even not R5RS.
|
||||
; And it is poorly documented.
|
||||
; I suppose it is supposed to be a pipe, where read() consumes,
|
||||
; and get-output-string returns the contents without consuming.
|
||||
|
||||
; !!! read is different from get-output-string
|
||||
; read forms a Scheme object
|
||||
; Here, the Scheme object is a symbol.
|
||||
|
||||
; FIXME fails ??? but because it is unichar, or because it fails with ASCII?
|
||||
|
||||
(test! "string read from input-output port equals string written to port")
|
||||
(assert '(string=?
|
||||
(let* ((aStringPort (open-input-output-string "foo")))
|
||||
(write "λ" aStringPort)
|
||||
(symbol->string (read aStringPort)))
|
||||
"\"fooλ\""))
|
|
@ -1,56 +0,0 @@
|
|||
; Test cases for string ports
|
||||
|
||||
; a string port is-a port (having read and write methods).
|
||||
; a string port stores its contents in memory (unlike device ports).
|
||||
; A read returns contents previously written.
|
||||
; A string port is practically infinite.
|
||||
|
||||
; a string port is like a string
|
||||
; a sequence of writes are like a sequence of appends to a string
|
||||
|
||||
|
||||
; Note that each assert is in its own environment,
|
||||
; so we can't define a global port outside????
|
||||
; Why shouldn't this work?
|
||||
; (define aStringPort (open-output-string))
|
||||
; (assert `(port? aStringPort))
|
||||
|
||||
|
||||
; open-output-string yields a port
|
||||
(assert '(port? (open-output-string)))
|
||||
|
||||
; string read from port equals string written to port
|
||||
; !!! with escaped double quote
|
||||
(assert '(string=?
|
||||
(let* ((aStringPort (open-output-string)))
|
||||
(write "foo" aStringPort)
|
||||
(get-output-string aStringPort))
|
||||
"\"foo\""))
|
||||
|
||||
; string read from port equals string repr of symbol written to port
|
||||
; !!! without escaped double quote
|
||||
(assert '(string=?
|
||||
(let* ((aStringPort (open-output-string)))
|
||||
; !!! 'foo is-a symbol whose repr is three characters: foo
|
||||
; write to a port writes the repr
|
||||
(write 'foo aStringPort)
|
||||
(get-output-string aStringPort))
|
||||
(symbol->string 'foo)))
|
||||
|
||||
; What is read equals the string written.
|
||||
; For edge case: writing more than 256 characters in two tranches
|
||||
; where second write crosses end boundary of 256 char buffer.
|
||||
|
||||
; issue #9495
|
||||
; Failing
|
||||
;(assert '(string=?
|
||||
; (let* ((aStringPort (open-output-string)))
|
||||
; (write (string->symbol (make-string 250 #\A)) aStringPort)
|
||||
; (write (string->symbol (make-string 7 #\B)) aStringPort)
|
||||
; (get-output-string aStringPort))
|
||||
; (string-append
|
||||
; (make-string 250 #\A)
|
||||
; (make-string 7 #\B))))
|
||||
|
||||
|
||||
|
|
@ -15,7 +15,11 @@
|
|||
(testing:load-test "atom2string.scm")
|
||||
(testing:load-test "integer2char.scm")
|
||||
|
||||
(testing:load-test "string-port.scm")
|
||||
(testing:load-test "string-escape.scm")
|
||||
(testing:load-test "string-port-output.scm")
|
||||
(testing:load-test "string-port-input.scm")
|
||||
; WIP
|
||||
; (testing:load-test "string-port-unichar.scm")
|
||||
|
||||
(testing:load-test "sharp-expr.scm")
|
||||
(testing:load-test "sharp-expr-char.scm")
|
||||
|
|
Loading…
Reference in New Issue