Paper 3. Sec 3.1) Data Representation with Majid Tahir

Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

Syllabus Content: 3.1 Data representation 3.1.1 User-defined data types  why user-defined types are necessary, define and use non-composite types: enumerated, pointer  define and use composite data types: set, record and class/object  choose and design an appropriate user-defined data type for a given problem 3.1.2 File organisation and access  file organisation: serial, sequential (using a key field) and random (using record key)  file access: sequential access (serial & sequential files) – direct access (sequential & random files)  select an appropriate method of file organisation and file access for a given problem 3.1.3 Real numbers and normalised floating-point representation  describe the format of binary floating-point real numbers  convert binary floating-point real numbers into denary and vice versa  normalise floating-point numbers & show understanding of the reasons for normalization  effects of changing the allocation of bits to mantissa and exponent in a floating-point representation  how underflow and overflow can occur  consequences of a binary representation only being an approximation to the real number it represents (in certain cases)  show understanding that binary representations can give rise to rounding errors

(Sec 3.1.1) User defined Data Type You have already met a variety of built-in data types with integers, strings, chars and more. But often these limited data types aren't enough and a programmer wants to build their own data types. Just as an integer is restricted to "a whole number from -2,147,483,648 through 2,147,483,647", user-defined data types have limits placed on their use by the programmer. A user defined data type is a feature in most high level programming languages which allows a user (programmer) to define data type according to his/her own requirements

There are two categories of user defined data types.:

1. Non-composite data type a. Enumerated data type b. Pointer data type 2. Composite a. Record data type b. Set data type c. Objects and classes

Non-composite user-defined data type:

1 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

An enumerated data type defines a list of possible values. The following pseudocode shows two examples of type definitions:

Pseudocode

Variables can then be declared and assigned values, for example:

It is important to note that the values of the enumerated type look like string values but they are not. They must not be enclosed in quote marks. The values defined in an enumerated data type are ordinal. This means that they have an implied order of values. This makes the second example much more useful because the ordering can be put to many uses in a program. For example, a comparison statement can be used with the values and variables of the enumerated data type:

Enumerated data type in vb.net

Enum Example

When you are in a situation to have a number of constants that are logically related to each other, you can define them together these constants in an enumerator list. An enumerated type is declared using the enum keyword.

Syntax:

2 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

Enum declaration

Enum Temperature Low Medium High End Enum

An enumeration has a name, an underlying data type, and a set of members. Each member represents a constant. It is useful when you have a set of values that are functionally significant and fixed.

Retrieve and check the Enum value

Dim value As Temperature = Temperature.Medium If value = Temperature.Medium Then Console.WriteLine("Temperature is Mediuam..") End If

3 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

Pointer Data Type:

The pointer data type is unique among the FreeBasic numeric data types. Instead of containing data, like the other numeric types, a pointer contains the memory address of data.

For many beginning programmers, pointers seem like a strange and mysterious beast. However, if you keep one rule in mind, you should not have any problems using pointers in your program. The rule is very simple: a pointer contains an address, not data. If you keep this simple rule in mind, you should have no problems using pointers.

The main non-composite, derived type is the pointer, a data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address. It is a primitive kind of reference. (In everyday terms, a page number in a book could be considered a piece of data that refers to another one). Pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" a pointer whose value was never a valid memory address would cause a program to crash. To ameliorate this potential problem, pointers are considered a separate type to the type of data they point to, even if the underlying representation is the same.

As you can see there are three basic steps to using pointers.

1. Declare a pointer variable. 2. Initialize the pointer to a memory address. 3. Dereference the pointer to manipulate the data at the pointed-to memory location.

This isn't really any different than using a standard variable, and you use pointers in much the same way as standard variables. The only real difference between the two is that in a standard variable, you can access the data directly, and with a pointer you must dereference the pointer to interact with the data.

A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on the indexed page.

Pseudocode for the definition of a pointer is illustrated by:

Declaration of a variable of pointer type does not require the caret symbol ^ to be used:

4 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

A special use of a pointer variable is to access the value stored at the address pointed to. The pointer variable is said to be 'dereferenced'

Composite user-defined data types

A composite user-defined data type has a definition with reference to at least one other type. Three examples are considered here.

Record data type

A record data type is the most useful and therefore most widely used. It allows the programmer to collect together values with different data types when these form a coherent whole.

As an example, a record could be used for a program using employee data. Pseudocode for defining the type could be:

An individual data item can th en be accessed using a dot notation:

A particular use of a record is for the implementation of a data structure where one or possibly two of the variables defined are pointer variables.

5 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

Records in VB:

Set data type:

A set data type allows a program to create sets and to apply the mathematical operations defined in set theory. The following is a representative list of the operations to be expected:  Union  Difference  Intersection  include an element in the set  exclude an element from the set  Check whether an element is in a set.

6 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

Objects and classes

In object-oriented programming, a program defines the classes to be used - they are all user-defined data types. Then for each class the objects must be defined. Chapter 27 (Section 27.03) has a full discussion of this subject.

Why are user-defined data types necessary?

When object-oriented programming is not being used a programmer may choose not to use any user-defined data types. However, for any reasonably large program it is likely that their use will make a program more understandable and less error-prone. Once the programmer has decided because of this advantage to use a data type that is not one of the built-in types then user-definition is inevitable. The use of, for instance, an integer variable is the same for any program. However, there cannot be a built-in record type because each different problem will need an individual definition of a record.

Serial Files:

Serial file organisation is the simplest file organisation method. In serial files, records are entered in the order of their creation. As such, the file is unordered, and is at best in chronological order. Serial files are primarily used as transaction files in which the transactions are recorded in the order that they occur. Sequential Files:

The key difference between a sequential file and a serial file is that it is ordered in a logical sequence based on a key field. This key is usually the primary key, though secondary keys may be used as well. Sequential files are therefore files that are sorted based on some key values.

Sequential files are primarily used in applications where there is a high file hit rate. Hit rate is a measure of the proportion of the records that is accessed in a single run of the application. Therefore, sequential files are ideal for master files and batch processing applications such as payroll systems in which almost all records are processed in a single run of the application. Random Files:

In random file organisation, records are stored in random order within the file. Though there is no sequencing to the placement of the records, there is however, a pre-defined

7 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

relationship between the key of the record and its location within the file. In other words, the value of the record key is mapped by an established function to the address within the file where it resides. Therefore, any record within the file can be directly accessed through the mapping function in roughly the same amount of time. The location of the record within the file therefore is not a factor in the access time of the record. As such, random files are also known in some literature as direct access files.

To create and maintain a random file, a mapping function must be established between the record key and the address where the record is held. If M is the mapping function, then

M(value of record key) => address of record

Hashing

There are various mapping techniques. Some involve using the key field to directly map to the location with the file, while others refer to some lookup table for the location. However, the more common method is to employ a hash function to derive the address.

Hashing is the process of transforming the key value of a record to yield an address location where the record is stored. In some literatures, it is also known as Key-to- Address Transformation, Address-Calculation, Scatter Storage, or Randomization.

A hash function generates the record address by performing some simple operations on the key or parts of the key. A good hashing function should be

• quick to calculate

• cover the complete range of the address space.

• give an even distribution

8 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

• not generate addresses that tend to cluster within a few locations, thus resulting in frequent collisions.

Real numbers: A real number is one with a fractional part. When we write down a value for a real number in the denary system we have a choice. We can use a simple representation or we can use an exponential notation (sometimes referred to as scientific notation). In this latter case we have options.

For example, the number 25.3 might alternatively be written as:

0.253 x 102 or 2.53 x 101 or 25.3 x 10° or 253 x 10-1

For this number, the simple expression is best but if a number is very large or very small the exponential notation is the only sensible choice. Fixed-point representations:

One possibility for handling numbers with fractional parts is to add bits after the decimal point: The first bit after the decimal point is the halves place, the next bit the quarter’s place, the next bit the eighth’s place, and so on.

Place Values

Suppose that we want to represent 1.625(10). We would want 1 in the ones place, leaving us with 0.625. Then we want 1 in the halves place, leaving us with 0.625 − 0.5 = 0.125. No quarters will fit, so put a 0 there. We want a 1 in the eighths place, and we subtract 0.125 from 0.125 to get 0.

So the binary representation of 1.625 would be 1.101(2).

So how fixed number representation stores a fractional number in binary format? See explanation below

9 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

A number 40.125 is to be converted into binary. First number 40 has to be converted as a normal denary to binary conversion, which gives: 40 = 101000

40 written in complete byte 40 = 00101000

Now 0.125 has to be converted to binary. We Multiply 0.125 by 2 e.g. 0.125 x 2 = 0.25 this is less than 1 so we put 0

0.125x2 = 0.25 0 0.25 x 2 = 0.5 0 0.5 x 2 = 1 1

So fractional number 40.125 number becomes 00101000.001

Sign bit +ve

Place value 128 64 32 16 8 4 2 1 . ½ ¼ 1/8 0 0 1 0 1 0 0 0 . 0 0 1

Negative Fixed Point Representation

Suppose -52.625 has to be converted into binary: 52 = 110100 50 written in complete byte 52 = 00110100 One’s Compliment = 11001011 Two’s Compliment -52= 11001100 (by adding 1 in One’s compliment)

Now Multiply 0.625 by 2 e.g. 0.625 x 2 = 1.25 so we put 1

0.625x2 = 1.25 1 0.25 x 2 = 0.5 0 0.5 x 2 = 1 1

So fractional number -52.125 becomes 11001100.101

Sign bit -ve

10 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

In Fixed-Point Representation option, an overall number of bits are chosen with a defined number of bits for the whole number part and the remainder for the fractional part.

Some important non-zero values in this representation are shown in Table below. (The bits are shown with a gap to indicate the implied position of the binary point.)

Floating-Point Number Representation

The alternative is a floating-point representation. The format for a float in g-point number can be generalised as:

In this option a defined number of bits are used for what is called the significand or mantissa, ±M. The remaining bits are used for the exponent or exrad, E. The radix, R is not stored in the representation; it has an implied value of 2.

Conversion from +ve Real Number to Binary Number:

A number 40.125 is to be converted into binary. First number 40 has to be converted as a normal denary to binary conversion, which gives: 40 = 101000 40 written in complete byte 40 = 00101000 Now 0.125 has to be converted to binary. We Multiply 0.125 by 2 e.g. 0.125 x 2 = 0.25 this is less than 1 so we put 0

0.125x2 = 0.25 0 0.25 x 2 = 0.5 0 0.5 x 2 = 1 1

11 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

So fractional number 40.125 number becomes 00101000.001

Sign bit +ve

But in binary numbers, decimals cannot be written. Decimal has to be converted into binary number too. 0 0 1 0 1 0 0 0 . 0 0 1

Decimal moved 6 places to left so exponent = 6 Six in binary is represented as 6 = 110

So the number becomes 0 101000 001 110 so number is 0101000001110

Conversion from -ve Real Number to Binary Number: Suppose -52.625 has to be converted into binary: 52 = 110100

50 written in complete byte 52 = 00110100 One’s Compliment = 11001011(by inverting 0’s to 1 and 1’s to 0) Two’s Compliment -52= 11001100 (by adding 1 in One’s compliment)

When we multiply 0.625 by 2 e.g. 0.625 x 2 = 1.25 so we put 1 0.625x2 = 1.25 1 0.25 x 2 = 0.5 0 0.5 x 2 = 1 1

So fractional number -52.125 becomes 11001100.101

Sign bit -ve But in binary numbers, decimals cannot be written. Decimal has to be converted into binary number too. 1 1 0 0 1 1 0 0 . 1 0 1

Decimal moved 7 places to left so exponent = 7 Seven in binary is represented as 7 = 111 So the number becomes 1 1001100101 111 so number is 11001100101111

12 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

Conversion from Binary Number to +ve Real Number:

Example 1: Q#1) In a computer system, real numbers are stored using floating-point representation with:  8 bits for the mantissa, followed by 8 bits for the exponent  Two’s complement form is used for both mantissa and exponent. A real number is stored as the following two bytes: Mantissa Exponent 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 Calculate the denary value of this number. Show your working 9608/31/O/N/15

Solution: sign bit As Exponent 0 0 0 0 0 0 1 1 = 3 denary Now Mantissa = 0 0 1 0 1 0 0 0 Decimal is actually after the sign bit So decimal would be 0.0101000 As exponent is 3 so decimal will move three places to right 0 . 0 1 0 1 0 0 0

8 4 2 1 . ½ ¼ 1/8 1/16 0 0 1 0 . 1 0 0 0

2 . 5 (Calculated as per place values)

Example 2:

Q#2) Floating Point Binary representation uses 4 bits for Mantissa and 4 bits for exponent Convert 0110 0010

Solution: As Exponent = 0 0 1 0 = 2 denary sign bit

Now Mantissa = 0 1 1 0 Decimal is actually after the sign bit

So decimal would be 0.110

As exponent is 2 so decimal will move two places to right 0 . 1 1 0

011 . 0

+ 3 . 0 (Calculated as per place values)

13 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

Conversion from Binary Number to -ve Real Number:

Example 1:

Floating Point Binary representation uses 4 bits for Mantissa and 4 bits for exponent Convert 1001 0001 Solution: sign bit As Exponent = 0 0 0 1 = 1 denary Now Mantissa = 1 0 0 1 Decimal is actually after the sign bit So decimal would be 1.001 As exponent is 1 so decimal will move one place to right 1 . 0 0 1 2 1 ½ ¼ (place Values) 1 0 . 0 1

sign bit -2 + ¼ (Calculated as per place values)

= -2 + 0.25 (-2 is –ve and +0.25 is +ve, so by adding 0.25 in -2, we get)

= - 1 . 75 (Answer)

A floating-point number is typically expressed in the scientific notation, with a fraction (F) or Mantissa (M), and exponent (E) of a certain radix (R), in the form of

+M×RE or -M×RE

Denary numbers use radix 10 (M×10E); while binary numbers use radix of 2 (M×2E).

Representation of floating point number is not unique. For example, the number 55.66 can be represented as 5.566×101, 0.5566×102, 0.05566×103, and so on.

The fractional part can be normalized. In the normalized form, there is only a single non- zero digit before the radix point.

For example, decimal number 123.4567 can be normalized as 1.234567×102; binary number 1010.1011 can be normalized as 1.0101011×23.

Both formats use essentially the same method for storing floating-point binary numbers, so

14 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

we will use the Short Real as an example in this tutorial. The bits in an IEEE Short Real are arranged as follows, with the most significant bit (MSB) on the left:

Fig . 1

Decimal Factored As... Binary Real Fraction 1/2 1/2 .1

1/4 1/4 .01

3/4 1/2 + 1/4 .11

1/8 1/8 .001

7/8 1/2 + 1/4 + 1/8 .111

3/8 1/4 + 1/8 .011

1/16 1/16 .0001

3/16 1/8 + 1/16 .0011

5/16 1/4 + 1/16 .0101

The Sign

The sign of a binary floating-point number is represented by a single bit. A 1 bit indicates a negative number, and a 0 bit indicates a positive number.

The Mantissa

It is useful to consider the way decimal floating-point numbers represent their mantissa. Using -3.154 x 105 as an example, the sign is negative, the mantissa is 3.154, and

15 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

the exponent is 5. The fractional portion of the mantissa is the sum of each digit multiplied by a power of 10:

.154 = 1/10 + 5/100 + 4/1000

A binary floating-point number is similar. For example, in the number +11.1011 x 23, the sign is positive, the mantissa is 11.1011, and the exponent is 3. The fractional portion of the mantissa is the sum of successive powers of 2. In our example, it is expressed as:

.1011 = 1/2 + 0/4 + 1/8 + 1/16

Or, you can calculate this value as 1011 divided by 24. In decimal terms, this is eleven divided by sixteen, or 0.6875. Combined with the left-hand side of 11.1011, the decimal value of the number is 3.6875. Here are additional examples:

Base 10 Binary Floating-Point Base 10 Decimal Fraction

11.11 3 3/4 3.75

0.00000000000000000000001 1/8388608 0.00000011920928955078125

The last entry in this table shows the smallest fraction that can be stored in a 23-bit mantissa. The following table shows a few simple examples of binary floating-point numbers alongside their equivalent decimal fractions and decimal values:

Decimal Binary Decimal Fraction Value

.1 1/2 .5

.01 1/4 .25

.001 1/8 .125

.0001 1/16 .0625

.00001 1/32 .03125

The Exponent

16 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

IEEE Short Real exponents are stored as 8-bit unsigned integers with a bias of 127. Let's use the number 1.101 x 25 as an example. The exponent (5) is added to 127 and the sum (132) is binary 10000100. Here are some examples of exponents, first shown in decimal, then adjusted, and finally in unsigned binary:

Adjusted Exponent (E) Binary (E + 127)

+5 132 10000100

0 127 01111111

-10 117 01110101

+128 255 11111111

-127 0 00000000

-1 126 01111110

The binary exponent is unsigned, and therefore cannot be negative. The largest possible exponent is 128-- when added to 127, it produces 255, the largest unsigned value represented by 8 bits. The approximate range is from 1.0 x 2-127 to 1.0 x 2+128.

Normalizing the Mantissa

Before a floating-point binary number can be stored correctly, its mantissa must be normalized. The process is basically the same as when normalizing a floating-point decimal number. For example, decimal 1234.567 is normalized as 1.234567 x 103 by moving the decimal point so that only one digit appears before the decimal. The exponent expresses the number of positions the decimal point was moved left (positive exponent) or moved right (negative exponent).

Similarly, the floating-point binary value 1101.101 is normalized as 1.101101 x 23 by moving the decimal point 3 positions to the left, and multiplying by 23. Here are some examples of normalizations:

Binary Value Normalized As Exponent

1101.101 1.101101 3

.00101 1.01 -3

17 Computer Science 9608 Paper 3. Sec 3.1) Data Representation with Majid Tahir

1.0001 1.0001 0

10000011.0 1.0000011 7

You may have noticed that in a normalized mantissa, the digit 1 always appears to the left of the decimal point. In fact, the leading 1 is omitted from the mantissa in the IEEE storage format because it is redundant.

A floating-point number (or real number) can represent a very large (1.23×1088) or a very small (1.23×10-88) value. It could also represent very large negative number (-1.23×1088) and very small negative number (-1.23×10-88), as well as zero, as illustrated:

Refrences  Book:A-level Computing  https://www.tutorialspoint.com/vb.net/vb.net_constants.htm