The Eiffel Compiler / Interpreter (tecomp)

---------------------------- --------- DRAFT ------------ ----------------------------

Entities in Eiffel are

- constant attributes (e.g. Pi: REAL = 3.14159265358979323846)
- variable attributes
- formal arguments
- local variables
- Current (always attached to the current object)
- Result (a special local variable which represents the returned object of a function)

All entities have a type, there are no untyped entities in Eiffel. An entity of a type T can be attached to objects which conform to T or convert to T (conformance and conversion see later). The attachment happens either by assignment or by argument passing. It is not possible to attach objects to constant attributes and to Current. Current is always attached to the current object, a constant attribute is always attached to the constant.

All objects have a type. The simplest types are the basic types.

This chapter explains the basic data types, the operators and expressions. Since complex data types can be built arbitrarily from the basic types it is important to understand the basic types first.

Many languages define builtin types. On top of the builtin types they build more complex types. In the Eiffel the standard types are not outside the type system. Each of the standard types like INTEGER, REAL, etc. is represented by an Eiffel class written in a class file (e.g. integer_32.e) in the kernel library.

However the compiler must "know" these types because it must represent these types by corresponding types of the concrete machine. Most features (operations like "+", "-", etc.) of these standard types are builtin features because they cannot be written with Eiffel. The Eiffel compiler, interpreter of virtual machine must make sure, that it translates the builtin features/operations into specific machine code operations to meet the below described semantics.

The standard types consist of the basic types (INTEGER, REAL, ... ) and some other types (like STRING) with clearly defined standardized semantics.

The basic types available in Eiffel are

BOOLEAN CHARACTER, CHARACTER_8, CHARACTER_32 INTEGER, INTEGER_8, INTEGER_16, INTEGER_32, INTEGER_64 NATURAL, NATURAL_8, NATURAL_16, NATURAL_32, NATURAL_64 REAL, REAL_32, REAL_64

The basic types are all `expanded`

. I.e. an entity of type INTEGER
represents an integer value (i.e. an integer object) and not a reference to an
integer object. All `expanded`

types have copy semantics, i.e. assignment
causes a copy of the value and not just the assignment of a reference.

The basic types are not just `expanded`

, they are also immutable. It is not
possible to change the value of an INTEGER, i.e. with a variable i of type
INTEGER, there is no operation like i.increment. The only possibility to
change the value of a variable of a basic type is to assign a new value to it,
e.g. like i := i + 1.

All basic types have default values. They need not be initialized explicitely. The default values are zero, False or the character with code zero for INTEGERs/REALs, BOOLEANs and CHARACTERs respectively.

CHARACTER, INTEGER, NATURAL and REAL have the above written sized variants. CHARACTER, INTEGER, NATURAL and REAL are not individual types, they are just synonyms for one of their sized variants. The possibilities are

CHARACTER: CHARACTER_8 or CHARACTER_32 INTEGER: INTEGER_32 or INTEGER_64 NATURAL: NATURAL_32 or NATURAL_64 REAL: REAL_32 or REAL_64

The sized variant can be chosen by a compiler option. The usual default is CHARACTER_8, INTEGER_32, NATURAL_32 and REAL_32.

A BOOLEAN can hold the truth values True and False.

An INTEGER_n represents a signed integer value in the range -2^(n-1) .. 2^(n-1) - 1, where n is either 8, 16, 32 or 64. I.e.

INTEGER_8: -128 .. 127 INTEGER_16: -32768 .. 32767 INTEGER_32: -2147483648 .. 2147483647 INTEGER_64: -9223372036854775808 .. 9223372036854775807

The NATURALs are unsigned are represent values in the range 0 .. 2^(n-1)

NATURAL_8: 0 .. 255 NATURAL_16: 0 .. 65535 NATURAL_32: 0 .. 4294967295 NATURAL_64: 0 .. 18446744073709551615

The REALs are floating point number in IEEE format. There is a 32 bit REAL_32 and a 64 bit REAL_64 floating point number.

BOOLEAN constants are True and False.

An INTEGER constant is any sequence of decimal digits within the range of INTEGER (remember INTEGER is either a synonym for INTEGER_32 or INTEGER_64). For better readability underscores can be used to group the digits (recommendation: groups of three decimal digits).

Examples of decimal INTEGERs:

1234 1_000_000_000 -256

INTEGER constants can be given in hexadecimal (base 16, prefix 0x), octal (base 8, prefix 0c) and binary (base 2, prefix 0b) as well. Eiffel uses prefixes to indicate the number base.

0xFF -- decimal value 255 0xa -- decimal value 10 0x8000_0000 -- decimal value -2147483648 0xffff_ffff -- decimal value -1 0c40 -- decimal value 32 = 4*8 0c77 -- decimal value 63 = 7*8 + 7 0b1_0000_0000 -- decimal value 256 = 2^8 0b1111 -- decimal value 15 = 2^4 - 1

Since integer constants must be of type INTEGER (usually INTEGER_32) they must be in that range. If INTEGER is a synonym for INTEGER_32, numbers exceeding that range cannot be represented. In order to represent them properly, they have to be prefixed with the type which has an appropriate range. E.g.

{INTEGER_64} -9223372036854775808 -- value exceeds INTEGER_32 range {NATURAL} 4294967295 -- value exceeds INTEGER_32 range {NATURAL_64} 18446744073709551615 -- value exceeds even INTEGER_64 range

A CHARACTER constant is written as one printable CHARACTER within single quotes like e.g.

'a' '#' '"' '@' '0'

Non printable characters can be represented by the escape sequences

'%A' -- At-sign @ '%B' -- Backspace BS * '%C' -- Circumflex ^ '%D' -- Dollar $ '%F' -- Formfeed FF * '%H' -- BackslasH BS '%L' -- TiLde ~ '%N' -- Newline NL * '%Q' -- BackQuote ` '%R' -- CarriageReturn CR * '%S' -- Sharp # '%T' -- HorizontalTab HT * '%U' -- NUll NUL * '%V' -- Vertical bar | '%%' -- Percent % * '%'' -- Single quote ' * '%"' -- Double quote " '%(' -- Opening bracket [ '%)' -- Closing bracket ] '%<' -- Opening brace { '%>' -- Closing brace }

It is also possible to define character constants by its character code in the form '%/code/'. The character code can be given in decimal, hexadecimal, octal or binary form. E.g.

'%/32/' -- character 32, i.e. blank in decimal, '%/0x20/' -- in hexadecimal, '%/0c40/' -- in octal, '%/0b1_0000/' -- and in binary notation

Valid REAL constants are

1. 1.0 1e4 .5 0.5

A string constant, or string literal, is a sequence of zero or more characters surrounded by double quotes, as in

```
"I am a string"
```

or

"" -- the empty string

The quotes are not part of the string, but serve only to delimit it. The same escape sequences used in character constants apply in strings; %" represents the double quote character. Example of a string with embedded escape sequences:

```
"A string with double quote %" and non printables like %T"
```

A long string can be line wrapped across several source lines, e.g.

```
"hello, %
%world"
```

is equivalent to

```
"hello, world"
```

Another possibility is to use verbatim strings. The verbatim string

```
"{
Hello,
world.
Don't forget me!
}"
```

is equivalent to

```
" Hello,%N world.%N Don't forget me!%N"
```

Since the line sequence delimited by "{ and }" is taken verbatim, the blanks in front of the text on the lines are taken verbatim as well. This is sometimes not wanted. There is a variant which strips off any common initial blanks and tabs which uses the delimiters "[ and ]". The verbatim string

```
"[
Hello,
world.
Don't forget me!
]"
```

is equivalent to

```
"Hello,%Nworld.%NDon't forget me!%N"
```

Only the indentation common to all lines is stripped off. If one or more lines are indented relative to the others, that indentation is kept. E.g.

```
"[
Hello,
world.
Don't forget me!
]"
```

is equivalent to

```
"Hello,%N world.%NDon't forget me!%N"
```

All entities (except Result and Current) must be declared. The following example shows typical declarations.

class C ... feature Pi: REAL = 3.14159265358979323846 -- a real constant Name: STRING = "Joe Cartwright" -- a string constant ival1, ival2, ival3: INTEGER -- variable attributes rval: REAL some_function (i,j,k: INTEGER; r: REAL): INTEGER local m,n: INTEGER s: REAL do -- some_function has access to all class level entities -- (Pi, Name, ival1, ival2, ival3, rval) -- to all formal arguments (i,j,k,r) -- to all local variables (m,n,s) -- to the entity Result (of type INTEGER) -- and to the entity Current (of type C) end ... end

The order of the feature declarations is not relevant. The attributes Pi, Name, ival1, ... could have been declared before or after the routines using them.

Nearly all semicolons in Eiffel are optional. They are inserted for better readability if more than one declaration or statement is placed on one line. The routine declaration

... some_function (i,j,k: INTEGER r: REAL): INTEGER ... ...

is legal without the semicolon. But this is not the recommended style.

Symbolic constants can be declared only at the class level (constant attributes). There are no local and no globel symbolic constants. If a class wants access to symbolic constants it either has to declare them as constant attributes in its class text or inherit them as constant attributes from a parent class.

A class is a namespace. All features in a class must have different names. The names of formal arguments and local variables have routine scope. They must be different from the names of all features (attributes or routines) and different from each other. Since the scope of formal arguments and local variables is local to a routine, their names can be reused in another routine.

The features of a class are the features declared in a class and the inherited features. Therefore it is not a good practice to give features very short names (like "i" or "n") because of the high probability to clash with the names of formal arguments and local variables (which are usually short). The style guide is to name features descriptive (e.g. count, capacity, put, etc.).

In Eiffel operators are just aliases for feature names. The expression

a + b * c

is a shorthand for

a.plus (b.product (c))

An operator alias is declared like

class INTEGER feature ... plus alias "+" (other: like Current): like Current do ... end plus product "*" (other: like Current): like Current do ... end ... end

Operators allow us to write expressions in a more natural manner. Furthermore operators have precedences which allow us to avoid a lot of parentheses and make the source code more readable.

Any class can use any operator for an alias of its features as long as there
is no name clash (i.e. different features must have different names **and**
different aliases). The precedence of the operators cannot be changed, the
precedence is defined by the language.

In the following we discuss the use of operators in the basic types.

The class INTEGER uses the binary arithmetic operators +,-,* the integer division //, the real division / and the power operator ^.

Integer division truncates the fractional part (i.e. 5//2 = 2), the expression

x \\ y

produces the remainder when x is divided by y, and thus is zero when y divides x exactly.

E.g., a year is a leap year if it is divisible by 4 but not by 100, except that years divisible by 400 are leap years. Therefore

local year: INTEGER do ... if year \\ 4 = 0 and year \\ 100 /= 0 or year \\ 400 = 0 then print ( year.out + " is a leap year%N" ) else print ( year.out + " is a leap year%N" ) end ... end

This example already shows that the binary arithmetic operators have
precedence over the relational operators (=, /=, ~, /~, ...). The relational
operators have precedence over the boolean binary operators (and, or, ...) and
`and`

takes precedence over `or`

(datailed precedence table see below)

Real division / applied to INTEGERs returns a REAL (i.e. 1/2 = 0.5).

Division by zero (all numeric types) results in an exception.

For negative operands the direction of the truncation of the integer division a //b and the sign of the result of a\\b is undefined. However the consistency relation

a = a//b * b + a\\b

is guaranteed

Overflow during arithmetic operations is not detected by the runtime. Addition and substraction is done with circular arithmetic (i.e. Largest_integer + 1 = Smallest_integer). Multiplication on n-bit INTEGER/NATURALs is done as if it were done with 2n bit size and the result truncated to n bits (i.e. the most significant n bits removed).

The INTEGERs/NATURALs have a power operator ^ to do the exponentiation a^b. The exponent must not be negative. The exponentiation a^b returns the same result as the repeated multiplication a*a*...*a (b times, with b>=0).

The REALs have an exponentiation operator as well. The exponentiation a^b with
REALs evaluates to a^b = exp(b*log(a)), were exp(x) is the exponential
function and log(x) is the natural logarithm. For `a <= 0`

the runtime
throws an arithmetic exception.

The operators // and \\ are not defined for REALs.

The relational operators are

< <= > >= = /~

They all have the same precedence and are not associative. Expression like

```
a < b < c -- invalid expression
```

or

```
a = b = c -- invalid expression
```

are invalid and rejected by the parser.

If you want to test, if a=b and c=d are either both True or both False you have to write

( a = b ) = ( c = d )

The boolean operators are

not -- unary and or xor -- binary strict and then or else implies -- binary semistrict

The binary operators `and`

, `or`

and `xor`

are strict. In ```
exp1 and
exp2
```

both expressions `exp1, exp2`

are evaluted and then the boolean value
of `exp1 and exp2`

will be evaluated.

The operators `and then`

, `or else`

and `implies`

are
semistrict. Evaluation stops as soon as the truth or falsehood of the result
is known. Therefore in some cases only the first operand will be evalutated by
the runtime. We get the semantics

a and then b -- evaluate a; if a is false the result is false -- if a is true, the result is the value of b a or else b -- evaluate a; if a is true the result is true -- if a is false, the result is the value of b a implies b -- evaluate a; if a is false the result is true -- if a is true, the result is the value of b

You may have already noted the equivalence

( a implies b ) = ( not a or else b ) -- definition of implication

- Note: Parentheses are necessary, because the relational operator = has higher precedence than the boolean operators.

The relative precedence of the boolean operators is

not -- highest and and then or xor or else implies -- lowest

All binary boolean operators associate left to right. This is inline with general practice in most modern programming language. The only unusual thing might be that

a implies b implies c

is equivalent to

( a implies b ) implies c

because `implies`

is an operator which is not available in most other
programming languages.

The intervall operator is

..

TBD

In Eiffel you can define free operators like e.g.

!-! @ |> <| -|-> <-|- ==> <== ++

You can form free operators by a sequence of the operator symbols

: \ ? = ~ / ! # $ % & * + - / < > @ ^ ` |

but you are not allowed to clash with sequences which have already a defined meaning. Some examples of invalid free operators

-- -- -- initiates a comment --> -- -- initiates a comment ? -- ? alone is a placeholder for agents, combinations ?/? are valid + -- + is a standard operator and not a free operator <= -- <= is a standard operator for "less equal" = -- = is the standard identity operator /= -- /= is the standard not identity operator -> -- -> already used for constraints of formal generics

The following table summarizes all precedence and associativity rules. Note that the rules are not complicated and in line with common practice. In order to minimize parentheses and maximize readability it is worthwhile to know these rules.

precedence associativity operators

10 old not + - (unary) all free unary operators 9 all free binary operators 8 right to left ^ 7 left to right * / // \\ 6 left to right + - (binary) 5 .. 4 = /= ~ /~ < > <= >= 3 left to right and and then 2 left to right or xor or else 1 left to right implies

Local Variables: mode: outline coding: iso-latin-1 outline-regexp: "=\\(=\\)*" End: