
An Introduction to Extended Pascal
by Tony Hetherington
Copyright 1993 Prospero Software Ltd.
The programming language Pascal was originally designed by Professor Niklaus Wirth, and named after a French mathematician and philosopher widely admired for the clear and direct nature of his ideas. Wirth set out two principal aims for the design of Pascal: that it should be systematic and coherent, so far as possible avoiding arbitrary restrictions, and that it should be suitable for efficient implementation on the currently available machines. A "User Manual and Report" defining Pascal was produced by Jensen and Wirth during the 1970's; the most recent third edition corresponds exactly to the first standard mentioned below.
Closely-related but not identical standards were published in 1983 by ISO (originally a BSI standard) and ANSI. While the definition they contained was very precise, and in this respect was something of a landmark, it did not attempt to add much to Wirth's original specification. One consequence was that while many implementations provided the standard language, they also devised their own extensions to provide users with features which had been found to be in demand.
After publication of the first standard, the ANSI/IEEE joint project committee continued to develop extensions, and from 1984 worked closely with the ISO group to produce the definition of an Extended Pascal language which would be upward-compatible with the earlier versions, equally rigorously defined but covering areas which experience had proved to be desirable. Extended Pascal standards which are identical in technical content were published in 1990 and 1991. After finalising the standard, the technical committees have continued to develop new features in the areas of Object Oriented programming and exception handling.
The Extended Pascal language is somewhat larger than
the earlier standard (which will generally be referred to here as "classic"
Pascal), and an introduction of this kind can give only a general survey.
However, the first complete implementation has been produced for
microcomputers running DOS, so it is fair to conclude that Wirth's
original aims have still been kept in mind.
Before considering major new features, it is useful to start a survey of
the Extended Pascal language with some enhancements which make the
facilities inherited from classic Pascal easier to use (and which apply
also in the major extensions described later). A number of these
enhancements may be familiar as local extensions to users of current
Pascal implementations.
Constant expressions can be used where classic Pascal requires
constants, for example in declarations. These constant expressions can
employ most of the predefined functions, as well as operators.
CONST linefeed = chr(10); TYPE buffer = ARRAY[0..bufflen-1] OF char;
Relaxed ordering. The declarations and definitions (CONST,
TYPE, VAR etc.) can be repeated, and can appear in any order,
provided that there are no forward references.
Functions can return results of any assignable type, including
arrays and records, and the processes of dereference (^),
indexing and field selection can be applied when appropriate to the type.
The result variable may be given a name (different from the function name)
that can be referenced in contexts that require a variable without causing
a recursive call of the function.
Subrange bounds may be constant expressions, in line with the
wider relaxation described above, and such subranges define conventional
static types. The bounds may also be general expressions, involving
run-time variables, and introducing nonstatic types; this topic is
discussed separately later.
CASE enhancements. In both variant record declarations and CASE
statements, ranges are permitted in constant-lists, and an OTHERWISE
clause may be introduced to complete the list.
CASE ch OF
'0'..'9': digit;
'A','E','I','O','U': vowel;
OTHERWISE other;
END {case};
Short-circuit operators. Variants of the Boolean operators AND
and OR are provided which guarantee that an expression it
evaluated no further than is necessary to determine the result. The new
variants are called AND_THEN and OR_ELSE, and they can
be used as in the following example to simplify conditional code.
IF (p <> NIL) AND_THEN (p^.f2 = 2) THEN ..... ELSE .....
If p is nil, control passes immediately to
the else part, and the illegal dereference p^.f2 (which would
result in a protection violation if the program were running in a
protected memory environment) is not executed.
Nondecimal numbers may be introduced into source programs using
the notation base#number, for instance 16#FF or 8#377
(hexadecimal FF = octal 377 = decimal 255).
Characteristics. Just as maxint gives implementation-specific
information about the integer type, there is a predefined constant maxchar
(the character with largest ordinal value), and constants maxreal,
minreal and epsreal which give the characteristics of
the real type.
Numeric input from textfiles accepts the representation laid down
in the data-interchange standard (ISO 6093), which permits a decimal point
in leading or trailing position.
Zero fieldwidth. The optional fieldwidth parameter in textfile
output may take the value zero.
Inverse ord. An enhancement to the succ and pred
functions allows them to take an optional second parameter of integer
type. This is more versatile than a simple inverse of ord, but among other
things gives succ that capability; for example, given a
conventional day-of-week enumeration, succ(Monday,3) yields
Thursday.
Underscores are permitted as significant characters within
identifiers, though not in leading or trailing positions.
In classic Pascal, all variables are created in an undefined state (that is, containing an undefined value). In Extended Pascal, an initial state can be defined which is automatically given to a variable when it is created. At the outer level an implementation may preset the initial state, but the facility applies also to local variables of procedures and to variables created on the heap. Furthermore, a type definition may carry an initial state, which is given to every variable of the type unless the variable declaration itself overrides it.
TYPE intz = integer VALUE zero; trec = RECORD a,b: intz; c: char VALUE '*'; END; VAR recp: ^ trec;
With these declarations, a call of new(recp)
causes a record to be created on the heap with fields a and
b initialised to zero and c initialised to asterisk.
The set constructors of classic Pascal form the basis of constructors to define other structured values. When all the ingredients are constant, a constructed value can be named as a constant or used to define initial states. Within statements, the constructor may include run-time values.
An array constructor may list specific indexes, with the values those elements are to be given, and/or may provide a default to be given to any elements not individually listed. A record constructor names record fields together with the values to be given; the whole record must be defined. (For some purposes it may be more convenient to specify values of individual fields in the record declaration.)
[3..6,9:5.5; 10:10.5; OTHERWISE 0] {array}
[f1,f4:10; f2:'$'; f3:'Message'] {record}
The processes of indexing and field selection can be
applied to structured constants (including string literals) just as to
variables; furthermore, within statements an index may be a run-time
value, so that different elements of the constant are accessed on
different occasions.
Classic Pascal does not include any facility for separate compilation of parts of a program. Besides limiting the scope of programs which can be produced on small machines, this has the important disadvantage that there is no standard form for the preparation of precompiled libraries. Almost every implementation of Pascal introduced an extension of some kind to overcome this limitation, and it was seen as one of the most important tasks of Extended Pascal to define a form of separate compilation which would not forfeit type security.
Besides the main program, Extended Pascal programs may include components known as modules. A module can export constants, types, variables, procedures and functions through named interfaces, and these interfaces may be imported by other modules or by the main program. By default, an interface is imported complete, with the names of all its constituents accessible, but there are several options to meet the difficulties which can arise in practice when importing from modules which were not designed in conjunction with one another. Instead of importing the whole of an interface, just selected items may be chosen; the names of constituents may be kept apart and referred to by giving the interface name; and constituents may be renamed on import.
A module has two parts: a heading and a block. The module heading contains declarations and definitions of any items which are to be exported, in particular the headings (but no more) of procedures and functions. The block includes the definitions of any exported procedures or functions, together with items which do not need to be known outside the module. There may also be initialization and finalization code.
The heading and block may be combined or separate. When separate, the possibility arises of alternative implementations of the same heading (with and without diagnostic code, for example), or of an implementation coded in some other language such as assembler.
A module heading may import from another module, and may re-export these
imported items, allowing for example composite library interfaces to be
constructed. A module block may independently import interfaces from other
modules. The export and import of an interface set up what is known as a "supplying"
relationship, in which the exporting module supplies the importing module
or program. The supply network puts some constraints on the sequence in
which modules can be compiled; in particular, it must not contain any
loops (which would imply that a module was indirectly attempting to supply
itself). Any initialization code for a module is executed before that of
any component which it supplies.
Modular construction is normally appropriate to larger programs, and small
examples inevitably appear trivial. However, the three related modules
which follow demonstrate a number of the possibilities.
Module one exports an interface named i1, containing two constants named lower and upper. A variable dummy is declared but not exported. Module one has a minimal module block.
MODULE one;
EXPORT i1 = (lower,upper);
CONST lower = 0;
upper = 11; {must be prime}
VAR dummy: Boolean;
END {of module-heading};
END {of module-block}.
Module two imports the constants lower and upper
from one, uses them to define a type, and also re-exports them.
It exports two interfaces named i2 and j2. Interface
i2 contains the type subr; j2 contains the
constants lower and upper. Module two also
has a minimal module block.
MODULE two;
EXPORT i2 = (subr); {has just one constituent}
j2 = (lower,upper);
IMPORT i1; {import all (both) constituents}
TYPE subr = lower..upper;
END {of module-heading};
END {of module-block}.
Module three demonstrates qualified import and renaming. It
exports one interface i3 containing a function, a type, and two
constants. It imports interfaces i1 and i2 qualified,
so references to the constituents within the module are prefixed with the
interface-names; further, the type subr is renamed lim_range
on import, so it is referred to as i2.lim_range. The constants
lower and upper are renamed on export as lim_lower
and lim_upper. The function-heading of limited is
declared in the module-heading, and the function-definition in the
module-block. Note that the parameter-list and result-type of limited
are not repeated in the definition; this arrangement is similar to
forward-declared procedures in classic Pascal.
MODULE three;
EXPORT i3 = (limited,i2.lim_range, {function and type}
i1.lower=>lim_lower,i1.upper=>lim_upper);
IMPORT i1 QUALIFIED; {lower, upper to be referenced
as i1.lower and i1.upper}
i2 QUALIFIED ONLY (subr=>lim_range);
FUNCTION limited(x: integer): i2.lim_range;
END {of module-heading};
FUNCTION limited;
BEGIN
IF x < i1.lower THEN limited := i1.lower
ELSE
IF x > i1.upper THEN limited := i1.upper
ELSE limited := x
END {limited};
END {of module-block}.
Restricted types provide a means of hiding the details
of a type when it is exported. The originator of a module may declare a "restricted"
version of a type, and export only the restricted form. An importer can
declare variables of such a type, and pass them as parameters to
procedures imported from the originating module, but can only treat them
as black boxes with no knowledge of their internal structure. Within the
module, the restricted parameters are of the unrestricted original type.
In classic Pascal, the only string facilities are associated with packed
arrays of char. This is another area in which a variety of local
extensions have arisen. Extended Pascal includes provision for dynamic
string types, and unifies them with classic Pascal strings and with
characters.
String variables are declared with a maximum capacity, for instance:
VAR s1,s2: string(20); fname: PACKED ARRAY [1..20] OF char;
String values have a length (number of characters). A dynamic string
variable such as s1 can hold a value of any length from zero up
to its capacity, and the object code keeps track of the current length.
With a fixed string such as fname, as found in classic Pascal,
the length of the contents is equal to the capacity; when a shorter value
is assigned to fname, it is padded on the right with spaces
until it fits. A variable of type char has a capacity of 1.
Variables of these three kinds, together with string literals and
character constants, produce general string values. In addition,
individual characters or substrings of string variables can be referenced
by indexing, for instance s1[i] or fname[1..8].
String values can be concatenated using the + operator, and
constants can be defined by constant expressions of string type, eg. 'ABC'+chr(13).
There are predeclared functions for the commonly-required string
operations such as locating a substring within a longer string.
Strings can be written to or read from textfiles, and versions of the
textfile read and write procedures are provided which
take a string variable in place of the file, making all the conversion and
editing processes available internally.
A string may be declared with capacity fixed at compile time, as in the
example above, or defined by a run-time variable expression. There are
also provisions for formal parameters which adjust themselves to the
actual parameter at each call.
PROCEDURE p (VAR s: string)
Dynamic strings of different capacities may be passed
to this procedure with each call; the code within the procedure can
discover the capacity of each actual parameter by reference to s.capacity.
If a variable n has the value 10, a string declared as string(n)
has the same type as one declared as string(10) for
compatibility purposes (though the type checking cannot be performed until
run-time). As will be seen in the next section, this rule and the
adaptable formal parameters are both particular cases of facilities that
apply to all schematic types, and arise from string being
formally defined to be a predeclared "schema" with additional
special properties.
It has been a characteristic of almost all versions of Pascal that data
types are static, that is, are ultimately defined at compile time. The ISO
version of classic Pascal included an optional feature called conformant
array parameters, a specialised parameter form for which actual arrays of
different sizes can be supplied in different calls of the same procedure.
This feature has been included in a number of implementations, and
provides a measure of flexibility which suits, for example, mathematical
procedures which manipulate arrays. All that is required is that actual
parameters "conform" to the formal in the sense of having the
same number of dimensions and final element type.
In the context of classic Pascal, conformant arrays give a degree of
flexibility when ready-made procedures are included in a program in source
form, but the actual parameters must ultimately all be static, with sizes
defined at compile time. Conformant arrays continue to be an optional
feature of Extended Pascal, but there is in addition a more far-reaching
variety of nonstatic types based on schemata.
A schema is a template describing a family of related types, from which
individual types can be produced by substituting either compile-time or
run-time values, typically to define subrange bounds or to select record
variants. These "schematic" types can be used in almost all
respects just like conventional static types: they can be used in the
declaration of variables, record fields and formal parameters; they can be
used as domain types of pointers, and returned by functions. It was
observed earlier that subranges can have their bounds determined at
runtime; such subranges are similar to individual schematic types without
the benefit of a family connection.
TYPE s(a,b: integer) = ARRAY [a..b] OF real; VAR x: s(0,n-1);
The schema s defines a family of array types. The variable x has a type produced from the schema by substituting the values 0 and n-1 for a and b. If n is a variable, the size of the array is determined at run-time. The index bounds of array x can be referenced as x.a and x.b, as in the statement
FOR i := x.a TO x.b DO writeln(x[i]);
Formal parameters may be declared with the original schema name, and
will adapt themselves to the actual parameter at each call, as described
earlier for the particular case of strings. In this respect they are
similar to conformant array parameters, but require that each actual is of
a type produced from the same schema. A pointer may also be declared to
have the schema name as its domain type, and an additional form of the
procedure new is provided which includes actual values to select
a type from the schema.
A schema can define a family of record types in which a variant is
selected by a parameter of the schema. The selection may be decided at
run-time, and unlike the form of new inherited from classic
Pascal, it does not require a constant selector. As with all schematic
types, such records can be local variables, parameters, fields of other
records, and so on. The choice of variant produces a specific type, which
cannot subsequently be changed; but as with any schema a parameter may be
declared with the schema name which will accept as actual arguments any of
the produced types. The use of variant records is made safer and more
flexible by these arrangements.
TYPE sub = 1..4; rec(m: sub; n: integer) = RECORD a,b: integer; CASE m OF 1: (f1: real); 2,3: (f2: string(n)); 4: ( ); END; rec2_20 = rec(2,20);
These definitions show both the selection of a record variant by parameter m and (when m is 2 or 3) how the capacity of string f2 can also be specified by parameter n. The type rec2_20 is one type produced from the schema rec.
PROCEDURE show_cap (r: rec);
BEGIN
IF r.m = 2 THEN writeln(r.f2.capacity);
END {show_cap};
If an actual parameter of type rec2_20 is
passed to procedure show_cap, the value 20 is displayed.
This feature is of use primarily in conjunction with schema types, and allows for example a local work variable to be declared with the same type as an actual parameter, when this type is not known until run-time and may differ from call to call. For example, in procedure show_cap above, a variable v could be declared as
VAR v: TYPE OF r;
This variable acquires the type of the actual
parameter at each activation.
Extended Pascal provides a method of binding a variable within the
program to an external entity; the most common example is the binding of a
file variable to an operating-system file. There is a predeclared record
type called BindingType which holds binding information;
procedures bind and unbind perform the actions, and a
function binding returns the current state of a variable. File
binding can be carried out by a sequence of operations which is relatively
independent of the environment; some other bindings (such as to a screen
image, or a clock) may be available in specific implementations.
Classic Pascal provided only sequential file processing. Extended Pascal
adds the capability to extend a sequential file, and also allows file
variables to be declared with an index type. Such variables can provide
direct access to individual file elements, by specifying an index value.
Direct-access files allow updating as well as reading and writing.
This example displays the string which is element i of the file:
VAR f: FILE [0..9999] OF string(20); ... SeekRead(f,i); writeln(f^);
A complex data type is provided. It is intentionally opaque, to permit implementations to choose the most appropriate representation; there are functions to obtain the real and imaginary parts (a Cartesian view), the magnitude and argument (a polar view), and also to construct a complex value from either pair of inputs. The mathematical operators and functions of classic Pascal can also take complex arguments and return complex results.
z2 := cos(z1 * 5.5);
writeln(re(z2),im(z2)); {Cartesian view}
writeln(abs(z2),arg(z2)); {polar view}
Two exponentiation operators are included. POW raises a value to
an integer power, and ** accepts a real exponent. In either
case, the left-hand operand can be integer, real or complex. An integer
operand of ** (as with the / operator) is cast to real
before the operation.
A new operator >< is defined, which takes the symmetric difference of two set values; there is a new predefined function card which returns the cardinality of a set (the number of members present); and the FOR statement allows a new form in which the control variable is given in turn the values defined by a set.
FOR n IN setvalue DO ...
A predeclared record type TimeStamp is defined, which contains fields for year, month, day, hour, minute and second. (It is envisaged that an implementation might add further details such as millisecond or time zone which would be processed transparently by the predefined routines.) A procedure GetTimeStamp sets the current values in a TimeStamp record; functions Date and Time take such a record as a parameter and return strings in display format. This division of tasks allows the display functions to be used independently of system dates:
VAR ts: TimeStamp;
...
ts.year := 1993;
ts.month := 1; {=January}
ts.day := 1;
writeln(date(ts)); {display in local format}
Protection may be given to a variable in two contexts.
The first is on export of the variable from a module; such a variable can
be modified by code within the module, but importers must treat it as
read-only. The originator of the module thereby ensures the security of
that variable. The second context for protection is in parameter lists. A
parameter may be declared to be protected; the code within the procedure
or function must then not contain statements which might change the
parameter. A caller passing a variable to a protected VAR
parameter, for example, knows there is no risk of it being modified, and
does not need to make a copy first; in the case of a large structure this
may represent a significant saving. Declaring a protected value parameter
indicates to an implementation or to the reader of the program text that
the actual parameter is "safe" and will not be modified during
execution of the procedure.
Pascal, at least in its standard form, has the reputation of being a safe but limited language. The purpose of this introduction to the features of Extended Pascal is to show that the range of the language has been greatly increased without compromising its security.
To err is human, and people (even programmers) make mistakes. In the development and maintenance of software, these are expensive if not worse, and the contribution that the programming language can make to the avoidance of mistakes is very significant. Classic Pascal has features which encourage, and indeed sometimes require, a secure programming style; it also encourages readability, which greatly benefits long-term maintenance. Extended Pascal gives much extra flexibility without sacrificing these advantages. Also, any programmer familiar with classic Pascal can adopt the new features of the extended language gradually, achieving a smooth transition as familiarity grows.
Portability across platforms is important to the serious developer, and the use of a standardised language provides an assurance of continuity both vertically across levels of machine and horizontally in time.
To fill one of the most significant gaps in classic Pascal, the language standard provides a framework in which libraries can be developed and distributed. An implementor with proprietory source code can, if he wishes, supply processed interfaces and compiled object files, his code still retaining the advantages of portability. On the other hand, the standard rightly does not set out to specify what individual libraries should contain. Areas such as graphics or numerical computing are essentially language- independent, where desirable facilities are specified which can then be "bound" to different languages, as for example in the emerging set of Language Independent Arithmetic standards. All languages can then benefit from the care and attention given by experts in each particular field.