University of Malta

Faculty of Engineering

DepartmentofCommunications and Computer Engineering

Final Year Pro ject

B Eng Hons

Investigation of the Error Performance of Tunstall Co ding

by

Johann Bria

A dissertation submitted in partial fullment of the requirements for the award of

Bachelor of Engineering Hons of the UniversityofMalta

June

Abstract

A lossless algorithm takes a string of symb ols and enco des it as a string

of bits such that the average number of bits required is less than that in the unco ded

case where all source symb ols are represented by equal length co dewords Compression

algorithms are only p ossible when some strings or some symbols in the input stream

are more probable than others these would b e enco ded in fewer bits than less probable

strings or symb ols resulting in a net average gain Most compression algorithms may

b e split into twomajortyp es xedtovariable length schemes such as Human co ding

and Arithmetic co ding enco de equallength source symb ols or strings with co dewords

of a variable length In variabletoxed length schemes such as the ZivLemp el co de

to units of variable length which are then transmitted with a the input is divided in

xedlength output co de

The Tunstall source co ding algorithm is a variabletoxed length enco ding scheme It

has b een frequently hyp othesised that in such schemes the error propagation and re

synchronisation diculties encountered in xedtovariable length schemes would b e less

signicant leading to a better p erformance in noise The error p erformance of various

Tunstall co des is analysed and a theoretical mo del prop osed Several parameters which

can be varied in the enco ding pro cess are considered with the ob jective of minimising

the eect of errors without sacricing compression The p ossibilityofcho osing an error

correction scheme optimised for the Tunstall algorithm is also considered including the

use of Unequal Error Protection techniques Finally the p erformance in noise of Tunstall

co ding is compared with that of Human co ding

Acknowledgements

I would like to take this opp ortunity to thank my sup ervisor Dr Victor Buttigieg

PhD Manch MSc Manch BElecEng Hons MIEEE for suggesting this eld

of study for my nal year pro ject and for all his invaluable assistance Many thanks also

go to Mr Edward Gatt for explaining various details of UNIX systems and for a lot of

patience during to o long simulations

This pro ject nds me once again indebted to my family particularly my parents for

their patience and supp ort throughout my studies Their encouragement in the pursuit

of knowledge is invaluable and deeply appreciated

FinallyIwould like to express my gratitude to all my friends particularly those who have

b een with me through these last four years They have help ed often without knowing

to makemy life what it is to day

Johann Bria

June

My dear Watson said he I cannot agree with those who rank mo desty among the

virtues To the logician all things should b e seen exactly as they are and to underesti

mate oneself is as much a departure from the truth as to exaggerate ones own p owers

Sherlo ck Holmes

The Adventure of the Greek Interpreter

Sir Arthur Conan Doyle

Contents

Intro duction

Data Compression

Unequal Symb ol Probability

Statistically Dep endantSymb ols

Lossless Reduction of Statistical Dep endance

Lossy Reduction of Statistical Dep endance

Eect of Errors in Compressed Data

Error Control Co ding

Do cument Structure

Co ding Algorithm The Tunstall

Tunstall Co dec

Tunstall Algorithm

Illustration

Co ding Eciency

Co deword Assignment

Co de Rate

Complications

Source Message to o Short

CONTENTS

Numb er of Source Symb ols not an Integral Power of Two

Sources with Memory

Mo difying the Tunstall Algorithm

Mo difying the Source Message

Shortcomings of the Tunstall Co ding System

of Tunstall Co de Error Performance

Measuring the Error Performance

Error Span and Error Increase

Levenshtein Distance

Standard Algorithm

Simplied Algorithm

Improved Algorithm

Validity of Simplied Algorithms

Error Span

Validity of Error Span

Algorithms to Minimise Error Span

Intro duction

Random Assignment

Sequential Assignment

Gray Co de Assignment

Simulated Annealing

Comments

ReactiveTabu Search

Comments

CONTENTS

Greedy Algorithm

Illustration

Implementation Details

ts to the Basic Algorithm Improvemen

Comments

SemiExhaustive Search

Illustration

Comments

Comparison with Human Co ding

Conclusions

Performance of Tunstall Co des in a BSC

Co ding Gain for Tunstall Co des

Mathematical Mo del for Calculating Co ding Gain

Comparison with Human Co ding

Use of Error Correction

Intro duction

Indep endent Channel Co ding

Comparison with Human Co ding

Optimised Channel Co ding

Unequal Protection of Bits Within a Co deword

Unequal Protection of Co dewords

Conclusions

Tunstall Co dec

Diculties of Having an Incomplete Tunstall Co de

CONTENTS

Sources with Memory

Sources which Cannot b e Fully Enco ded

Adaptive Compression

Minimising Error Span

Comparison with Human Co ding

Use of Error Control Co ding

Optimised Error Protection for Tunstall Co des

A Source Statistics

B Program Do cumentation

B EndUser Programs

B Library Functions

B Simulation Controllers

B UtilityFunctions

List of Figures

Mo del of a communication system

Tunstall tree at one level of expansion

Message transmission using Tunstall co ding

Matrix used to calculate Levenshtein distance

Simulation to test validity and accuracy of Levenshtein distance algorithms

Dierence in the Error Increase calculated by the approximate algorithms

as compared to the standard algorithm

Comparison of Error Span and Error Increase

Error Span values for source toy

Error Span values for source pic

Error Span distribution for source pic with random co dewords

Error Span distribution for source eng with random co dewords

Simulated Annealing Basic algorithm

Typical simulated annealing prole

ReactiveTabu Search Basic algorithm

Evolution of the tabu search showing the escap e mechanism in action

List size dynamics for the reactivetabusearch

Greedy Algorithm

LIST OF FIGURES

Co deword allo cations considered by the semiexhaustive algorithm

Source pic with no error correction

Detail from Fig

Source eng with no error correction

Comparison b etween mathematical mo del and simulation results

Co ding gain b etween Tunstall co des with dierentcodeword assignments

Co ding gain b etween a Tunstall co de and an uncompressed co de

Comparison with Human co ding for source pic with no error correction

Comparison with Human co ding for source eng with no error correction

Eect of BER on SER for source pic with no error correction

Eect of BER on SER for source eng with no error correction

Source pic protected with BCH singleerrorcorrecting co de

Source pic protected with BCH dualerrorcorrecting co de

Source eng protected with BCH singleerrorcorrecting co de

Comparison with Human co ding for source pic protected with BCH

singleerrorcorrecting co de

Comparison with Human co ding for source eng protected with BCH

singleerrorcorrecting co de

Eect of BER on SER for source pic protected with BCH single

errorcorrecting co de

Contribution of dierent bits in the co deword to the Error Span pic

Contribution of dierent bits in the co deword to the Error Span eng

Contribution of dierentcodewords to the Error Span pic

Contribution of dierentcodewords to the Error Span eng

List of Tables

Tunstall co de for source toy with sequentially assigned co dewords

Timings for dierent Levenshtein distance algorithms

Co de rates achievable on source pic

Co de rates achievable on source eng

Error Spans achievable with Tunstall and Human co ding

A Source statistics for source toy

A Source statistics for source toy

A Source statistics for colour image pic

A Source statistics for symb ol English source eng

Glossary

A The source message b eing a sequence of f source symb ols a a a



f f

th

a The i symb ol in the source message where a S

i i m

B The deco ded message b eing a sequence of g source symbols b b b

g  g

th

b The i symb ol in the deco ded message where b S

i i m

C The transmitted compressed message b eing a sequence of h Tunstall co de

h

words c c c



h

th

c The i co deword in the transmitted message where c T

i i n

D The received compressed message which for a complete Tunstall co de is a

h

sequence of h Tunstall co dewords d d d



h

th

d The i co deword in the received message where d T for a complete

i i n

Tunstall co de

di j The Levenshtein distance between the initial i symb ols of message A and

f

the initial j symb ols of message B

g

Esx The Error Span of Tunstall co de x

E

b

The channel Signal to Noise Ratio SNR

N

E The Symb ol Error Rate of the deco ded message

f The length of the source message in symb ols

g The length of the deco ded message in symb ols

h The length of the transmitted message in co dewords

Lds s The Levenshtein distance b etween the source symb ol sequences s and s

 

s The length in bits of the sequence of source symbols s

m The numb er of symb ols in the source alphab et S

m

n The number of dened co dewords in the Tunstall co de equivalent to the

numb er of leaf no des in the Tunstall tree

GLOSSARY

ps The probability of o ccurrence of the source string s

p The crossover probability for a Binary Symmetric Channel

q The probability that a bit is transmitted without error through the Binary

Symmetric Channel

R The co de rate

S The source alphab et consisting of m dierentsymb ols fs s s g

m  m

s A source symb ol i m

i

s A sequence of one or more source symb ols

T The Tunstall co de consisting of n dierentcodewords ft t t g

n  n

t ATunstall co deword which is used to enco de a particular sequence of source

i

symb ols i n

wa b The cost of substituting symbol a with symbol b

i j i j

w The cost of a deletion

d

w The cost of an insertion

i

w The cost of a substitution

s

x The sequence of source symbols represented by the co deword with binary

i

value i

xt The sequence of source symb ols represented by the Tunstall co deword t T

n

x The set of sequences of source symb ols represented by all Tunstall co dewords

x fxt xt xt g

 n

x The mathematical mean of a set of values or a distribution

The contribution of co deword i to the Error Span

i

The contribution of bit i over the set of co dewords to the Error Span

i

The Error Span of a Tunstall co de

The co ding eciency

The numb er of bits required to enco de a source symbol dlog me



The numb er of bits required to enco de a Tunstall co deword dlog ne



The standard deviation of a distribution

Abbreviations

BCH BoseChaudhuriHo cquenghem co des

BER Bit Error Rate

BPSK Binary Phase Shift Keying

BSC Binary Symmetric Channel

DCT Discrete Cosine Transform

ECC Error Correcting Co des

FEC Forward Error Correction

JPEG Joint Photographic Exp erts Group

LPC Linear Predictive Co ding

LZ Lemp elZiv co ding

LZW Lemp elZivWelch compression

MPEG Moving Pictures Exp erts Group

SER Symb ol Error Rate

SNR Signal to Noise Ratio

UEP Unequal Error Protection

VLSI Very Large Scale Integration

Chapter

Intro duction

Data Compression

Data compression techniques seek to reduce redundancy in the source message thereby

enco ding the same amount of information in a lower number of bits In practice this

means that with the use of compression data storage devices can hold more information

as in computer magnetic disk applications Alternatively a larger number of channels

can be transmitted through a given bandwidth as used in mobile telephony Just as

redundancy can come in many forms dep ending on the application so there are many

dierent compression techniques

Unequal Symbol Probability

At the lowest level certain individual symbols b eing transmitted may have a larger

Compression algorithms which op erate at this probability of o ccurrence than others

level enco de suchsymb ols with less bits than less probable symbols suchthatonaverage

the numb er of bits required to transmit a symb ol will b e less than b efore

Such techniques known as entropy enco ders assume a source with statistically indep en

dant uncorrelated symb ols While some algorithms including the Tunstall algorithm

actually require the source symb ols to b e uncorrelated to op erate at maximum eciency

none of the other algorithms in this class will make use of higherlevel redundancy in the

source to improve compression

CHAPTER INTRODUCTION

Algorithms whichwork at this level include Farrell

ShannonFano co ding

Human co ding

Arithmetic co ding

Tunstall co ding

Statistically Dep endant Symb ols

Another common form of source redundancy whichmay b e present together with unequal

symbol probabilities is where the source symbols are not statistically dep endant For

example in the English language the letter Q is almost always followed by the letter

U Many practical information sources such as digital sp eech and images are

highly inecient b ecause of this intersymb ol dep endance or inuence In practice the

compression algorithms used with such sources can b e divided into two categories lossless

and lossy techniques

Lossless Reduction of Statistical Dep endance

These compression techniques are termed lossless b ecause their op eration is completely

reversible That is the output of the deco der is a bitforbit copy of the original source

This feature is particularly imp ortant in sources such as text computer programs and

any other case where even an error in a single bit cannot b e aorded

It should b e noted that for some of these algorithms in particular for Runlength co ding

the algorithm only seeks to reduce the statistical dep endance of the source symb ols by

transforming the source It then relies on another compression metho d to reduce the

redundancy due to unequal symbol probabilities Compression algorithms which fall

into this category include Nelson

Runlength co ding

Lemp elZiv co ding LZ LZ LZSS

Lemp elZivWelch co ding LZW

CHAPTER INTRODUCTION

Lossy Reduction of Statistical Dep endance

Lossy techniques are usually applied on sources such as digital video images and au

dio where an exact replica of the source is not required but a good approximation is

generally sucient Thus by sacricing accuracy a much higher degree of compres

sion can be achieved even an order of magnitude or more b etter then lossless tech

niques Lossy techniques are now an accepted and generally preferred metho d for

Such techniques in compressing digitally stored representations of analog phenomena

clude Nelson Wallace Pancha el Zarki

Joint Photographic Exp erts Group JPEG for still images

Moving Pictures Exp erts Group MPEG for live video

law and Alaw of audio samples

Linear Predictive Co ding LPC commonly applied to sp eech

The technique adopted in these typ es of compression algorithms is to transform the

source into a sequence of symb ols which has a lot of redundancy in terms of symbol

probabilities ie some symb ols havea very high probability of o ccurrence while having

as little as p ossible correlation b etween symb ols Typically lossy techniques are combined

with lossless techniques for reducing statistical dep endance and with other techniques

for compressing sources with unequal symbol probabilities For example the JPEG

algorithm uses a DCT Discrete Cosine Transform to transform the spatial signal into

its corresp onding frequency co ecients Then with the assumption that human vision

is not very sensitive to subtle changes in tone the co ecients are quantised As a

result a large number of co ecients b ecome zero The resulting sequence is then run

length compressed and then further enco ded using a Human co de or an Arithmetic

co de Press et al

Eect of Errors in Compressed Data

The use of compression means that on average one bit of compressed data will be

decompressed into a larger number of bits Thus it is natural to conjecture that an

CHAPTER INTRODUCTION

error in a single bit of a compressed stream will result in a larger number of errors in

the decompressed stream In general over most channels the errors in the compressed

stream are restricted to bit inversions only However errors in the deco ded stream

may also include insertion or deletion of symb ols For this reason the natural metho d

of counting the number of errors to be the number of bit inversions can no longer be

used Instead a metric known as the Levenshtein distance may b e used Kruskal

metric calculates the minimum number of symbol insertions deletions andor This

subsitutions that are necessary to transform the deco ded message backinto the original

source message

Error Control Co ding

It has already been seen that certain information sources cannot aord to have even

a single bit in error Furthermore the whole theory of data compression is based on

the assumption that the channel is noiseless However in practice all channels have a

certain amount of noise present although this is usually very small For this reason the

compressed data must b e protected against errors The eld of error control co ding deals

can with the creation of structured redundancy in the message such that the receiver

detect and p ossibly correct errors caused by the channel This is generally applied after

the source is compressed however it is normal for error control co ding to b e sp ecic to

the channel ie to the typ e of noise normally present while the compression technique

is sp ecic to the source information to achieve maximum compression This is shown

in Fig

Do cument Structure

Although Shannons Separation Theorem states that wemay separate source and channel

co ding without any loss in p erformance it is hyp othesised that for a given complexity

cho osing an error control technique matched to the compressed data will result in b etter

p erformance in noise Practically all compression algorithms haveanentropy enco ding

stage Thus the most logical error control technique is to optimise for this last stage

CHAPTER INTRODUCTION

Figure Mo del of a communication system

of compression since it is directly aected by noise on the channel The noise acting

on previous stages is characterised by the error p erformace of subsequent compression

stages and not only by the typ e of noise on the channel In this pro ject the Tunstall

enco ding algorithm is analysed in terms of its error p erformance Chapter two deals

with the Tunstall enco ding algorithm and includes a full description of the assumptions

taken

Chapter three is an analysis of the error p erformance of the Tunstall algorithm Various

denitions are given and a metric for quantifying the error p erformance is develop ed

In order to improve the error p erformance of the Tunstall co des various algorithms for

global minimisation were investigated These are describ ed in chapter four Besides the

Greedy algorithm and the SemiExhaustive Search the chapter also describ es the Sim

ulated Annealing algorithm and the Tabu Searchtechnique The former is already very

p opular and has already found successful applications in suchareas as computer VLSI

design signal pro cessing job scheduling and even in co ding theory The tabu search

algorithm on the other hand is still a relatively new technique and many mo dications

to the algorithm are still b eing develop ed

Chapter ve analyses the p erformance of the Tunstall co de when transmitted over a

Binary Symmetric Channel at dierent signaltonoise ratios A mathematical mo del

of the p erformance of a Tunstall co de in noise is develop ed The Tunstall co de is then

CHAPTER INTRODUCTION

compared with another more p opular compression technique in its same class Human

co ding

The addition of ErrorCorrecting Co des is investigated in chapter six b oth using generic

schemes and also with schemes optimised for the Tunstall co de

Finallychapter seven gives some conclusions and scop e for future work

Chapter

The Tunstall Co ding Algorithm

Tunstall Co dec

In Tunstall co ding Tunstall a variable numb er of source symb ols is enco ded into

a xed length co deword in sucha waythat

any co deword maps to axed sequence of source symb ols Sequences mapp ed to

dierentcodewords are generally of unequal length

the sequences of source symb ols are arranged such that the information contentof

each sequence is approximately equal

the number of co dewords n is chosen as an integral power of two such that the

co dewords can b e transmitted with eciency through a binary channel

As a result the average number of bits per source symbol is less than that required if

every source symb ol is simply enco ded using a xedlength co deword This leads to the

required data compression

CHAPTER THE TUNSTALL CODING ALGORITHM

Tunstall Algorithm

Denition Source message

The source alphabet S consists of a set of m dierent symbols fs s s g where

m  m

ps ps ps The source message A is any sequenceoff source symbols

 m

f

a a a a S i f

 i m

f

Denition Transmitted message

of a binary Tunstal l T is of a set of n dierent codewords denoted The codebook

n

by ft t t g Note that n is an integral power of two The compressed message C

 n

h

is any sequence of h Tunstal l codewords c c c c T i h

 i n

h

The algorithm used to cho ose the sequences of source symb ols from a source S tobe

m

mapp ed to dierent co dewords is given b elow

A tree is created with the ro ot no de ha ving probability and having as its

branches the m source symb ols The resulting m leaf no des havethe probability

of o ccurrence of the corresp onding source symbol

The no de with the highest probability of o ccurrence representing the sequence of

source symb ols s is split up into m branches such that the resulting leaf no des

have a probability of psps psps where ps is the probability of

m i

these probabilities o ccurrence of the source symbol s i m Note that

i

assume a memoryless source Also initially s s since the source symb ol with

highest probabilityis s After splitting the no de with the highest probability the

numb er of leaf no des in the tree increases by m

Step is rep eated until the total number of leaf no des n is equal to an integral

power of two This constitutes what is termed as one level of expansion of the

Tunstall tree

Steps and can b e rep eated further if required until n is a larger integral p ower

of two This increases the complexity of the co de and also tends to make the leaf

no des more equiprobable hence increasing the co des eciency

CHAPTER THE TUNSTALL CODING ALGORITHM

Assign equal length co dewords of log n bits to the leaf no des The choice



of which co deword maps to which leaf no de is arbitrary and do es not aect the

compression However it do es aect other prop erties of the co de as discussed in

chapter

Thus all that is necessary to recreate the same co de is

the table of probabilities of each source symbol

a predened metho d for dealing with equiprobable source symbols

the co deword length in bits required

the co deword assignment algorithm

In the enco ding pro cess the Tunstall co dewords are used to enco de the corresp onding

message sequences generated by the source

Illustration

Note that for Consider a binary source where S f g and p p



a source with m symb ols each symbol can be simply enco ded in dlog me bits In



this case m so the source symb ols can b e enco ded in exactly bit It can b e shown

that for a source where m is an integral power of two the Tunstall tree can always b e

split up until there are leaf no des where is an integer Thus the Tunstall tree with



one level of expansion has m leaf no des Hence in this case m and n This is

shown in Fig



The allo cation of co dewords to the leaf no des is arbitrary and in this case follows a

versed from top to b ottom Note that every time strict sequential order as the tree is tra

a no de is split the new branches are placed in order of decreasing probability Also in

this case there is still quite a large dierence b etween the probabilities of the resulting

leaf no des This is a result of the large dierences in the probabilities of the source

symb ols and can b e alleviated by expanding the tree further



see Section

 see Section

CHAPTER THE TUNSTALL CODING ALGORITHM

Figure Tunstall tree at one level of expansion

Co ding Eciency

The treesplitting pro cess which is at the core of the Tunstall algorithm attempts to

pro duce n Tunstall co dewords representing almost equiprobable source sequences Since

n where is an integer these symb ols can b e enco ded with maximum eciency by

the set of binary numb ers having bits The co ding eciency is given by

entropy

average length

n

X

pt log pt

i i



i

n

X

pt t

i i

i

where pt is the probability of o ccurrence of Tunstall co deword t

i i

t is the length in bits of Tunstall co deword t

i i

n is the number of Tunstall co dewords

Now since all co dewords have the same length t for all t T the average

i i n

length can b e expressed by

n n

X X

pt pt t

i i i

i i

Hence the co ding eciency is given by

n

X

pt log pt

i i



i

CHAPTER THE TUNSTALL CODING ALGORITHM

Codeword Source String

Table Tunstall co de for source toy with sequentially assigned co dewords

Co deword Assignment

Since do es not dep end on the value of t any of the co dewords may be assigned to

i

any one of the symb ol sequences without aecting the co des eciency The co deword

assignment however aects other prop erties of the co de as discussed in chapter

Co de Rate

The compression factor of Tunstall co des can be expressed by the co de rate which is

dened as the average numb er of source bits represented p er enco ded bit For Tunstall

co des the co de rate is given by

n

X

xt

i

n

i

R

n

X

xt

i

n

i

where xt is the length in bits of the source sequence represented by Tunstall

i

co deword t

i

Complications

Source Message to o Short

When enco ding a source message a complication arises at the end of the source message

There is a p ossibility that after attempting to enco de all the source symb ols in a message

there remains a nal sequence of source symbols which cannot be enco ded This is

illustrated in the following example

CHAPTER THE TUNSTALL CODING ALGORITHM

Consider the Tunstall co de presented in Table If the original source message is

this would b e enco ded as

where acts as a logical separation between source symbols and the sequences are

group ed in the corresp onding Tunstall symb ol sequences Note that the nal symbol

cannot b e enco ded since the co der exp ects it to b e followed by at least another symb ol

In practice this has to b e treated as a sp ecial case

One way of solving this problem is to enco de the last few source symb ols with any of the

co dewords corresp onding to those sequences having the same rst symb ols Then the

length of the source message in symb ols is transmitted b efore the Tunstall co dewords

The deco der may then stop when the required number of symb ols have been deco ded

This solution however may cause considerable problems in a noisy channel the output

may be truncated prematurely if there were more insertion than deletion errors in the

deco ded message and the deco der would detect an early end of the transmitted message

deletions than insertions in the deco ded message Also there may if there were more

always b e an error in the symbol count itself which causes similar symptoms to those

describ ed ab ove

Other ways of dealing with the problem may b e devised However to avoid the compli

cations attributable to this sp ecial case it is assumed that it is always p ossible to fully

and uniquely enco de the source message

Number of Source Symb ols not an Integral Power of Two

It is always p ossible to have n an integral power of two if m itself is an integral power

a source with m symb ols It should be clear that the number of leaf of two Consider

no des in the Tunstall tree is initially m and is increased by m atevery split Thus

the numb er of leaf no des and hence co dewords after j splits is given by

n m j m

CHAPTER THE TUNSTALL CODING ALGORITHM



Thus if j mwe get n m Now if m where is an integer

 

n

where is also an integer

For sources where m is not an integral p ower of two it may not b e p ossible to cho ose j

such that n is an integral p ower of two In such cases the Tunstall tree splitting pro cess

must be stopp ed prematurely This is usually done when the next split will result in

to o many leaf no des to allow them to be enco ded in a chosen number of bits Say for

example that the chosen co deword size is bits and that the original source has m

symb ols After k splits of the Tunstall tree n m k m leaf no des are generated

A further split results in n m leaf no des which is not acceptable It is clear

that no matter what co deword assignment is used there will always b e n co dewords

left unassigned This is of no consequence for a noiseless channel except for a loss

in co ding eciency as the mapping between co dewords and source sequences is still

ays p ossible to receive one of the unassigned valid For noisy channels however it is alw

co dewords and something must be done ab out it Possible reactions include deco ding

it as a null string as the most probable sequence or as the most probable sequence

which has a co deword with a Hamming distance of one from the undened co deword

Treatmentofsuch cases is not considered and all sources used here have m where

is an integer

Sources with Memory

The Tunstall algorithm as describ ed assumes that the source is memoryless If the

source has memory the co ding eciency is generally reduced In such cases one should

either mo dify the Tunstall algorithm to take the eect of source memory into accoun t

or mo dify the source in suchaway that the eect of memory b ecomes negligible

Mo difying the Tunstall Algorithm

The only change that should b e necessary to the treebuilding algorithm is a mo dica

tion of the probability of o ccurrence of the leaf no des In the standard algorithm at

every stage of the treesplitting pro cess the probability of o ccurrence of any leaf no de is

CHAPTER THE TUNSTALL CODING ALGORITHM

calculated bymultiplying the probability of o ccurrence of all the source symb ols that the

leaf no de represents For example if a leaf no de represents in order the source symbols

s s s then the probability of the leaf no de should b e calculated as ps js js If the

     



source is memoryless this can b e calculated as ps js js ps ps However for

    

a source with memory it will haveto be calculated directly from the source Thus the

table of probabilities of the source symb ols is not sucient higher order statistics b eing

necessary for the creation of the Tunstall tree A generalisation of Tunstall co ding to

sources with memory was analysed bySavari and Gallager

Mo difying the Source Message

An alternativeway of dealing with such sources is to disrupt the source memory b efore

the enco ding pro cess This can be done in various ways the more b enecial would be

to use such memory in the source to improve the compression such as by runlength

enco ding the source or by treating certain source phrases as symb ols in their own right

Treatment of sources with memory in sucha way tends to b e complicated and dep ends

heavily on the typ e of source b eing investigated

Shortcomings of the Tunstall Co ding System

Co des pro duced with the Tunstall algorithm cannot achieve co ding eciency Also

the co ding eciency of a Tunstall co de is generally lower than what can be achieved

t Arithmetic co ding These disadvantages of with Human co ding or the more ecien

the Tunstall co ding algorithm may explain whyTunstall co ding has not received much

attention in the literature although b eing prop osed almost thirtyyears ago

Maximum co ding eciency is achieved only when the length of a co deword is equal to its

information content Since the Tunstall algorithm pro duces equallength co dewords the

condition for eciency implies that the probability of o ccurrence of all co dewords

must b e equal

Consider a source alphab et S fs s s g where the symb ols have probability

m  m

of o ccurrence ps ps ps resp ectively Now when the last split is p erformed

 m

CHAPTER THE TUNSTALL CODING ALGORITHM

on the Tunstall tree the leaf no de with the largest probability of o ccurrence is split into

m leaf no des Thus eachofthesem leaf no des will b e assigned a dierentcodeword If

the leaf no de that was split had a probability of o ccurrence ps the leaf no des pro duced

will have a probability of o ccurrence psps psps psps Assuming that

 m

the original source did not have equiprobable symbols there is no point in trying to

compress such a source anyway then the co dewords mapp ed to these leaf no des will not

be equiprobable Thus the Tunstall co de itself do es not have equiprobable co dewords

and hence can never attain eciency

Chapter

Error Performance of Tunstall Co de

Measuring the Error Performance

Throughout this do cument the source message is a nite sequence of symb ols generated

by a memoryless source Similarly the compressed message is also a nite sequence of

symb ols taken from the co de of Tunstall co dewords It is assumed that the source message

translates into an integral number of Tunstall co dewords such that when the compressed

message is deco ded it results in exactly the same message that was enco ded without

any extra symb ols app ended Fig shows how the source message is transmitted

between the source and sink

It is assumed that the enco ded message is to b e transmitted through the Binary Symmet

ric Channel Proakis This channel by denition can only intro duce bit inversions

into the enco ded message The numb er of errors intro duced by the channel can thus b e

measured b y the Hamming distance metric Substitutional errors in a co deword will

cause this co dewordtobeinterpreted as a dierent but valid one during the deco ding

pro cess This means that the sequence of source symb ols represented by the original

co deword will be replaced bythe sequence corresp onding to the incorrectly interpreted

co deword Furthermore the sequences corresp onding to these co dewords maynotbeof

the same length leading to insertion deletion as well as substitution of source symbols

in the deco ded message with resp ect to the original



Assuming that the Tunstall co de is complete

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

Figure Message transmission using Tunstall co ding

Denition Received message

Since the Tunstal l code T is complete and only binary substitutional errors may occur

n

along the channel the received compressed message D is any sequence of h Tunstal l

h

codewords d d d d T i h

 i n

h

Denition Deco ded message

Since the received compressed message stil l consists of valid Tunstal l c odewords the de

coded message B is any sequenceofg source symbols b b b b S i g

g  g i m

The Hamming distance metric is based on comparing corresp onding elements from the

two sequences b eing compared It is thus incapable of dealing meaningfully with insertion

and deletion errors as this may cause a loss of synchronisation and all subsequent

symb ols would be treated as if they were substituted resulting in an inated value for

the numb er of errors

seek the appropriate corresp on The general approach to compare such sequences is to

dence b etween symb ols from the two sequences and optimise over all p ossible corresp on

dences which satisfy suitable conditions such as preserving the order of symb ols in the

sequence The Levenshtein distance Kruskal is such a metric and is ideally suited

to our purp ose By dening the costs of symb ol insertion deletion and substitution to

be unity the Levenshtein distance will represent the minimum number of insertions

deletions and substitutions that are necessary to translate the original message into the incorrectly deco ded one

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

Denition The Levenshtein Distance LdA B between the symbol sequences A

g

f f

and B is the minimum number of symbol insertions deletions and substitutions neces

g

sary to transform A into B

g

f

Denition The number of Symb ol Errors between the source and decoded messages is

dened as the Levenshtein distance between the messages computed with a unity weight

for insertions deletions and substitutions

Error Span and Error Increase

In xedtovariable length enco ding schemes the greatest adverse eect on error p er

is due to loss of synchronisation It is thus understandable that the error formance

p erformance of such co des is measured by the average number of source symb ols that

maybe incorrectly deco ded becauseofarandom bit inversion in the enco ded message

This has b een termed the Error Span and has been investigated for various variable

length co des Maxted Robinson

While loss of synchronisation in the enco ded stream is a ma jor problem in designing

go o d variable length co des it has no eect on xed length co des if it is assumed that

hannel Also in variabletoxed length enco ding only bit inversions can o ccur along the c

schemes many sequences corresp onding to dierentcodewords tend to b e similar This

leads to a hyp othesis that one may design a co de such that random singlebit inversions

in a co deword result in that co deword b eing transformed into another one asso ciated

with a sequence very similar to that of the original co deword One would in this way

reduce the eect of random singlebit errors Thus a new metric must be chosen to

quantify the error p erformance of these co des For a Tunstall co de in particular the

Error Span and Error Increase metrics have b een adopted as dened b elow

number of symbol errors in the Denition The Error Span is dened as the average

decoded message for a random single bit error in the compressed message

Denition The Error Increase is dened as the average number of symbol errors in

the decoded message for every random bit error in the compressed message This reduces

to the ratio of symbol errors to bit errors and depends on the channel bit error rate

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

Figure Matrix used to calculate Levenshtein distance

Levenshtein Distance

Standard Algorithm

The basic algorithm for calculating the Levenshtein distance between two messages of

unequal length is based on the principle of dynamic programming Kruskal The

algorithm is guaranteed to nd the smal lest numb er of insertions deletions and substi

tutions cho osing b etween them as appropriate which can reconstruct message B from

g

A of length g and f symb ols resp ectively Consider the case when wewant to calculate

f

the distance LdA B

g

f

We build an f g matrix as shown in Fig In this matrix moving

down one p osition is equivalenttoa deletion A similar movementto the rightis

equivalent to an insertion while a diagonal movementdownright is equivalentto

a substition if the corresp onding symb ols in A and B are dierent

g

f

Thus if we dene the costs weights of an insertion deletion and substitution to

be w w and w resp ectively then the entry di j atrow i and column j in the

i s

d

matrix is given by

di j w

d

di j min

di j wa b

i j

di j w

i

if a b

i j

wa b

i j

b w if a

j s i

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

th

where a is the i symbol in message A

i

f

th

b is the j symb ol in message B

j g

d

di i f

dj j g

The whole matrix is computed using Eqn with w w w

i s

d

The required Levenshtein distance is given by

Ld A B df g

g

f

This algorithm always gives the correct smallest p ossible result but the complexityof

the algorithm is prop ortional to f g making the algorithm unsuitable to o computa

tionally exp ensive for comparing very long messages

Simplied Algorithm

An algorithm for calculating an approximate value for the Levenshtein distance was

devised suitable to use even for very long messages This is necessary for computing the

Error Increase of a Tunstall co de where the message needs to be relatively long at low

bit error rates in order to obtain a go o d statistical value

In our case we only need to calculate the Levenshtein distance between the source

message and the deco ded message Thus certain assumptions maybetaken to simplify

the algorithm and we can also make use of some extra information we have ab out

b oth messages Comparing the transmitted and received enco ded messages we note that

only substitution of bits can occur between the two messages due to the nature of the

channel Hence as discussed in Section a corrupted co deword is transformed into

another valid co deword it can never result in an undened word Thus we only need

to compare the sequences of symb ols corresp onding to the co dewords in error in the

received message as compared to the transmitted one

A rst approximation for the Levenshtein distance between the two messages can be

found by adding only the Levenshtein distances b etween the source sequences represented

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

by corresp onding co dewords which have been corrupted by noise This approximation

can b e calculated by

h

X

LdA B Ld xc xd

g i i

f

i

where xt is the sequence of source symb ols corresp onding to Tunstall co deword t

c is the transmitted co deword

i

d is the corresp onding received co deword

i

h is the numb er of co dewords in the received message

As an illustration consider the Tunstall co de given in Table If the original source

message is

after enco ding the transmitted message would b e

This message is then passed through a noisy channel supp ose that errors occur in the

third and fourth co dewords for a total of three bit inversions such that the received

message is

where the bits in error are represented in b old This would b e deco ded as

Using the standard algorithm we have to compare with

This results in a Levenshtein distance of two which can be explained either as one

insertion and one deletion

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

or as substitutions

Using the simplied algorithm we add the Levenshtein distances b etween the sequences

represented by the corresp onding co dewords Note that if c d there is no need to

i i

compute the Levenshtein distance b etween the source sequences they represent Thus

i c d xc xd Ld

i i i i

The result obtained by the simplied algorithm is also equal to twoone insertion and

one deletion Note that the result obtained by the simplied algorithm will at least be

matched by the standard algorithm in many cases particularly with long messages the

standard algorithm gives a smaller distance Wecanthus say that

h

X

Ld xc xd LdA B

i i g

f

i

Improved Algorithm

The simplied algorithm p erforms very well at low bit error rates Problems start to

arise however when the error rate is relatively high In such cases the assumptions

taken are no longer valid The eect of high error rates is b est illustrated by an example

consider again the Tunstall co de given in Table If the original source message is

after enco ding the transmitted message would b e

Now if this message passes through a noisy channel with a high bit error rate say

p and the received message is

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

this would b e deco ded as

Using the standard algorithm gives a Levenshtein distance of seven symbols between the

source and deco ded messages The simplied algorithm considers the streams co deword

by co deword adding the Levenshtein distances b etween the corresp onding strings

i c d xc xd Ld

i i i i

This gives a result of nine A mo died algorithm was therefore develop ed whereby

strings corresp onding to subsequentcodewords having at least one bit in error are con

catenated and then the Levenshtein distance is calculated on these concatenated strings

Considering the ab ove example the mo died algorithm considers

Enco ded Message Segments Unco ded Message Seqments

Original Corrupted Original Deco ded Ld

This gives the improved result of seven

Validity of Simplied Algorithms

To conrm the accuracy and dep endability of the simplied algorithms dierent random

streams were enco ded with the same Tunstall co de and then sub jected to random noise

with the same bit error rate The Levenshtein distance b etween the original stream and

the deco ded stream was calculated using the standard algorithm and the two simplied

metho ds This was rep eated at dierent bit error rates increasing the size of the stream

at lower error rates and monitoring the variance of results obtained The Levenshtein

distances obtained by the dierent algorithms were then used to calculate the Error

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

2.25

1.75

1.25 Error Increase

0.75 Simplified Algorithm Improved Algorithm Standard Algorithm

0.25 10−2 10−1 100

Bit Error Rate

Figure Simulation to test validity and accuracy of Levenshtein distance algorithms

Increase which was plotted against the bit error rate as in Fig To be able to

compare b etter the dierent algorithms the dierence b etween the approximate results

and the actual results was also plotted shown in Fig

From the results obtained Fig it is clear that the improved algorithm for calculating

an approximate value for the Levenshtein distance is very accurate and the results are

practically the same as those obtained using the standard algorithm even at high bit error

rates It is also clear see Fig that the improved algorithm is always signicantly

b etter at calculating the Levenshtein distance than the simplied algorithm

The eciency of the simplied algorithms b ecomes visible when one considers the timings

shown in Table taken from a simulation run on a Sun SPARCstation

Note how the decrease in sp eed between the Improved Algorithm and the Simplied

Algorithm is higher for large bit error rates times slower instead of times This

is to b e exp ected since at high bit error rates there is a larger probability that successiv e

co dewords contain an error The dierence in sp eed b etween these two approximate al

gorithms is however dwarfed in comparison with the standard algorithm The Improved

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

100

10−1

10−2 Error

10−3

Simplified Algorithm Improved Algorithm

10−4 10−2 10−1 100

Bit Error Rate

Figure Dierence in the Error Increase calculated by the approximate algorithms as

compared to the standard algorithm

Algorithm Bit Error Rate Lengthsymbols Times

Simplied in

Improved in

Standard in

Simplied in

Improved in

Standard in

Table Timings for dierentLevenshtein distance algorithms

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE



Algorithm was found to b e more than times faster than the Standard Algorithm for



the larger bit error rate b etter still it is almost times faster for the lower bit error

rate In fact while the time taken for the Standard Algorithm increases quadratically

with message length and is indep endent of the bit error rate the time taken for the

Improved Algorithm increases only linearly assuming the bit error rate is kept constant

This is particularly imp ortant for the large messages necessary when considering lowbit

error rates

Error Span

The value for the Error Span of a particular co de may b e calculated from the knowledge of

the co de table and the probabilities of each source symbol Taking the same assumptions

that were taken to construct the Tunstall tree the Error Span is calculated as follows

Every co deword is considered in order

For every p ossible single bit error in the co deword the Levenshtein distance be

of source symbols represented by the original co deword and tween the sequences

the co deword with the single bit error is calculated

Assuming a BSC mo del for the noisy channel it is equally probable for the single

bit error to b e in the rst bit of the co deword as it is for the error to b e in another

bit

Thus for a co deword of width the Levenshtein distances calculated ab ove are

scaled down by a factor and then added together This represents the average

numb er of symb ol errors if a single bit error o ccurs in that particular co deword

This is calculated for every co deword and scaled by the probability of o ccurrence

of that co deword The results are added together giving the average number of

symb ol errors in the deco ded message for a single bit error in the enco ded message

CHAPTER ERROR PERFORMANCE OF TUNSTALL CODE

2.0

1.5

1.0

Error Span/Error Increase 0.5

Error Span Error Increase

0.0 10−4 10−3 10−2 10−1 100

Bit Error Rate

Figure Comparison of Error Span and Error Increase

Mathematically the Error Span EsxforaTunstall co de x is given by

n

X X

px Ldx x Esx

j

i i

i

n

i j 

where x is the string mapp ed to co deword i represented in binary

i

px is the probability of o ccurrence of string x

i i

x is the set of strings making up the co de

denotes the bitwise logicalXOR op eration

Validity of Error Span

The Error Span is dened as the average Levenshtein Distance for a random single bit

error in the enco ded message This means that the ratio of Levenshtein distance to the

number of bit errors Error Increase should approach the Error Span as the bit error

rate tends to zero This phenomenon was veried b y plotting the Error Increase against

the bit error rate as can b e seen in Fig Source pic was used and the co dewords

for the bit Tunstall co de pro duced were assigned using the sequential algorithm

Chapter

Algorithms to Minimise Error Span

Intro duction

The Error Span of a Tunstall co de dep ends on the assignment of co dewords to the

sequences of source symb ols More imp ortantly this assignment is completely arbitrary

in that it will not aect the compression p erformance of the co de This feature has b een

exploited by seeking to assign co dewords in such a way as to reduce the Error Span

leading to an improvement in the error p erformance Various dierent co de assignment

algorithms have b een investigated they may b e sub divided as follows

Reference algorithms whichprovide a p erformance to measure against

The average of the ensemble of Random assignments

Simple algorithms where the emphasis is on sp eed and ease of implementation

Sequential assignment the Tunstall tree is traversed in a preordered fashion

and successive leaf no des are assigned numerically successive co dewords

Gray co de assignment successive leaf no des are assigned successive co de

words from a graycode

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Elab orate algorithms where the minimum p ossible Error Span is sought at the exp ense

of computation time and algorithm complexity

Simulated Annealing which has been successfully employed in other elds

and is a p opular and relatively simple technique el Gamal et al

Tabu Search which is a relatively new technique for global minimisation

Various improvements on the standard algorithm as prop osed by Battiti

have b een used Battiti Tecchioli

A Greedy algorithm which tries to match maximum Hamming distance be

tween co dewords with maximum Levenshtein distance b etween sequences

A SemiExhaustive algorithm whichisessentially an enhanced version of the

Greedy algorithm It avoids bias towards the rst pair of co dewords with the

largest Hamming distance found by considering all other pairs of co dewords

with an equal Hamming distance

The ecacy and eciency of the elab orate algorithms were investigated In particular

they were compared with the reference assignment as well as with the simple sequential

and gray co de assignments

The eect of the choice of parameters for the dierent algorithms was also considered

Unfortunatelyit is very dicult to plot the search space and an exhaustive searchis

only p ossible for the very simplest of co des Thus it is very dicult to say how well a

particular algorithm p erforms since one do es not know if the algorithm actually managed

to reach the global minimum The b est one can currently do is to compare the algorithms

with each other

shown in Typical values for Error Span obtained with the dierent algorithms are

Figs and for sources toy and pic resp ectively For source toy Table A

the Tunstall tree was expanded for bit co dewords while source pic Table A was

enco ded with bit Tunstall co dewords Except for the Random assignment the results

shown are the b est obtained with the resp ective algorithm



The numb er of dimensions is very large b eing equal to the number of codewords

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Simulated Annealing

Tabu Search

Semi−Exhaustive [Low]

Semi−Exhaustive [High]

Greedy [Low]

Greedy [High]

Sequential

Gray Code

Random

1.0 1.2 1.4 1.6 1.8 2.0

Error Span

Figure Error Span values for source toy

Simulated Annealing

Tabu Search

Gray Code

Sequential

Greedy [Low]

Greedy [High]

Random

1.0 1.5 2.0 2.5 3.0

Error Span

Figure Error Span values for source pic

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

300

200 Frequency

100

0 2.40 2.50 2.60 2.70

Error Span

Figure Error Span distribution for source pic with randomly assigned bit co de

x words

Random Assignment

The Random Assignment construction assigns completely arbitrary co dewords to the

sequences of symb ols generated by the Tunstall algorithm As a result the p erformance

of the co de in terms of the Error Span is the average taken over all p ossible co des To

this end a suciently large number of random co deword assignments were tried and

the distribution of Error Span plotted as shown in Figs and for sources pic

and eng resp ectively One of the random assignments with an Error Span suciently

close to the mean of the distribution was then chosen as the reference assignment

It can b e noted that for b oth these co des the distribution of the Error Span has a very

small variance This lends weight to the assertion that the particular assignmentchosen

co des with randomly chosen co dewords would have a is a valid reference since most

p erformance similar to the reference on average An imp ortant p oint to note is that all

other assignment algorithms consistently pro duced co des with a lower Error Span than

the chosen reference

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

200

150

100 Frequency

50

0 2.22 2.24 2.26 2.28

Error Span

Figure Error Span distribution for source eng with randomly assigned bit

co dewords x

Sequential Assignment

The sequential assignmentwas originally implemented b ecause it is a logical choice for

co deword assignment and one which has a high probabilityofbeing chosen if the p er

formance in noise were not being considered The algorithm is very fast and easy to

implement furthermore the Error Span of such co des though not as low as for Simu

lated Annealing is still signicantly b etter than the reference The source symb ols are

sorted in order of decreasing probability and the sequences of symb ols pro duced by the

Tunstall algorithm are assigned co dewords sequentially considered as binary numbers

Gray Co de Assignment

The gray co de assignment is the simplest attempt at obtaining a lower Error Span and

the sp eed of the algorithm is comparable with that of the sequential assignment Here

the co dewords are assigned sequentially from a gray co de The Error Span obtained is

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

only marginally b etter than the sequential assignment however and then only for the

more complex co des with for a co de with source symb ols and co dewords

where are integers In practice the p erformance of the simple algorithms is roughly

halfwaybetween the reference and the b est solution obtained by other algorithms

Simulated Annealing

The simulated annealing algorithm for global minimisation is now a wellproven and

accepted technique and has b een used successfully in various applications for minimisa

tionofacombinatorial rather than a continuous function el Gamal et al The

metho d is a close analogy with thermo dynamics particularly with the way that slowly

co oled crystals achieve a state of minimum energy

Metrop olis Algorithm The minimisation algorithm implemented in this case is the

where a downhill move is always p erformed while an uphill move has a probability

E E

new

old

T

p e where T is the temp erature E is the energy b efore the moveand E

new

ol d

is the energy after the move b eing considered Press et al In our case where the

function to b e minimised is the Error Span the Energy Function used is the Error Span

itself while temp erature is a virtual value which is decreased exp onentially The basic

algorithm is outlined in Fig

The parameters that can b e varied include

the energy function

the p erturbation scheme

the initial co de assignment X



the initial and nal temp eratures

the annealing schedule

the numb er of iterations allowed at every temp erature

the number of state changes allowed per temp erature if less than the maximum

numb er of iterations

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

pro cedure simulated annealing

b egin

Initialise state X X



Initialise temp erature T T



rep eat

rep eat

Cho ose X a p erturbation of X

Let E EnergyX EnergyX

E

T

if E or r andom e

Change state X X

end if

until several state changes or to o many iterations

Up date temp erature according to annealing schedule

until nal temp erature reached or conguration is stable

end

Figure Simulated Annealing Basic algorithm

In the simplest implementation the energy function chosen in our case is the Error Span

itself It is generally preferred to have a p erturbation scheme and energy function chosen

such that the variation of energy at every p erturbation be as smo oth as p ossible This

has b een attempted in our case by restricting p erturbations to a swap of two random

co dewords with Hamming distance of The p erturbation scheme must b e chosen so as

not to exclude any part of the search space

The initial co de assignment X should not makeany dierence in the result if the other



parameters are chosen correctly This is b ecause at the initial suciently high tem

p erature the systems energy will rise considerably Thus a random assignment should

initial be as good as any other This has been veried exp erimentally by allowing the

conguration to b e chosen by the user

To allow the system to migrate towards a highenergy state in the initial phase the

starting temp erature should be chosen such that T E This ensures that up

 av g

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

hill changes have a high probability of o ccurrence at initial temp eratures Note that

the choice of a go o d p erturbation scheme is necessary such that E has a suciently

restricted variance

The algorithm stops when the state conguration b ecomes stable This can be imple

mented by stopping the algorithm after sayve reductions in temp erature o ccur with

out any new state b eing accepted In the general case this is enough but a minimum

temp erature condition is usually also applied This minimum temp erature should be

chosen such that it is suciently lower than the temp eratures at which the conguration

b ecomes stable in the normal case

Since the probability of acceptance of an uphill move decreases exp onentially with

temp erature the annealing schedule is generally a geometric decrease of temp erature

T T where lies in the range Values of closer to make

i i

the temp erature decrease more slowly and generally results in b etter annealing with a

p enalty to b e paid as an increased simulation time

The maximum numb er of iterations to b e p erformed at every temp erature should b e large

enough to allow the conguration to reach an energy state typical of that temp erature

In practice the numb er of iterations should b e such that a signicant prop ortion of the

p ossible p erturbations are tried The only adverse eect of cho osing to o high a value is a

prop ortional increase in computation time It may b e noted that this value is particularly

hanges are in fact accepted At imp ortant at low temp eratures where very few state c

high temp eratures most prop osed changes will b e accepted so the neccessary number of

iterations is lower at high temp eratures This is achieved by adding another limit to the

number of iterations the maximum number of accepted state changes which should

be lower then the maximum number of iterations typically by an order of magnitude

Thus at high temp eratures the numb er of iterations p erformed is limited by the number

of accepted state changes while at lower temp eratures it is limited by the maximum

numb er of iterations

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

3.0

2.5

2.0 Energy (Error Span) 1.5

1.0 10−5 10−4 10−3 10−2 10−1 100

Temperature

Figure Typical simulated annealing prole

Comments

A typical annealing prole showing how the Energy varies as the Temp erature is de

creased can b e seen in Fig Due to the large numb er of iterations at every temp er

ature a vertical line is plotted showing the maximum and minimum energy attained at

any temp erature The source used was pic with initial and nal temp eratures of

 

and resp ectively In this case iterations were allowed at every temp erature or



accepted state changes whichever comes rst The temp erature was decreased in

steps of and the initial conguration was a sequential co deword assignment

Note how although the initial state was not random the starting temp erature was

was presentintheconguration This is suciently high to disrupt any structure that

imp ortant to ensure that the initial conguration do es not aect the quality of the nal

result Also note that at high temp eratures the energy band o ccupied is wider than

that at lower temp eratures This reects the convergence of the simulated annealing

algorithm pro duced by the decreasing probability of accepting a large change in energy

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Simulated Annealing is a relatively simple algorithm to implement and the Error Span

obtained is often b etter than that obtained with any other algorithm The sp eed of the

algorithm however is several orders of magnitude worse than the simple algorithms

This eectively means that such an algorithm is only useful in cases where the source

statistics are xed such as an enco der for English text with a xed statistical mo del

The algorithm if started at a suciently high temp erature do es not exhibit anymarked

dep endence on the starting conguration

Reactive Tabu Search

Tabu Search is a heuristic metho d for global minimisation in a combinatorial problem At

the heart of the technique is the use of exible memory structures to guide other lower

level optimisation algorithms to escap e the trap of lo cal optima It uses randomisation

selectively or not at all instead controlling the search pro cess with the use of exible

memory and higherlevel knowledge of the problem under investigation

Strict Tabu denoted by STABU all previously visited con In its most basic form

gurations are disallowed constraining the search towards new areas However Tabu

Search is still a comparatively new technique and alternative metho dologies are b eing

investigated One particular problem with STABU is that it is p ossible for the search

to end up in a p osition where all neighb ours have already been visited and are thus

classied as tabu

One common implementation of the Tabu Search algorithm makes use of a xedsize Tabu

List denoted FTABU containing the last few moves p erformed which are disallowed

algorithm from retracing its steps and also avoids nding all neighb ours This stops the

tabu in most cases but cycles longer than the list size are still p ossible A further

problem of the FTABU algorithm is that the result obtained is dep endant on the list

size which has to b e determined exp erimentally

Various advances on the algorithm are due to Battiti and Tecchioli and have b een

dubb ed the ReactiveTabu Search RTABU Battiti Tecchioli This adds ex

plicit checking for rep etitions of congurations and the appropriate size of the list is

learned dynamically as the system reacts to the o ccurrence of cycles In addition the

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

scheme also forces an escap e mechanism if a number of congurations are visited very

often The escap e sequence executes a number of random moves prop ortional to the

moving average of the detected cycle length Further renements include an aspiration

criterion whereby a move which lands the system in a b etter state than the b est one

found so far is allowed even if it is classied tabu The basic algorithm for the Reactive

Tabu Search is given in Fig

In our case the current conguration is the assignment of co dewords An elementary

yswap of two co dewords with a Hamming distance of one the set of neighb ours moveisan

is dened as all congurations that can b e reached by a single elementary move

Comments

Fig is a typical graph showing the change in Error Span against the number of

iterations p erformed The source used is again pic with the sequential assignment

chosen as the initial conguration The algorithm follows a path of steep est descent

from its initial conguration The list size at this p oint is set to one which is sucient

to avoid the algorithm retracing its steps As so on as a lo cal minimum is reached the

list size starts increasing as shown neighb ourho o d of that minimum is searched The

in Fig to avoid b eing trapp ed at this lo cal minimum The sudden peak in the

Current State curve is due to the algorithms escap e mechanism which is triggered when

a chaotic lo op is detected The subsequent at section indicates that the vicinity of a

lo cal minimum tends to be at with only few holes near the b order which lead to a

b etter solution This is a serious disadvantage for the Tabu Search algorithm making it

dicult to escap e from a lo cal minimum

The Reactive Tabu Search is a considerably more complex algorithm than Simulated

Annealing It has a ma jor disadvantage that as implemented the results obtained de

p end heavily on the starting conguration used In practice the b est results are usually

obtained when the system is initialised with a gray co de and such results are often

marginally worse than those obtained with Simulated Annealing For some cases how

ever results which are marginally b etter than with Simulated Annealing were obtained

The algorithm tends to reach a stable minimum after relatively few iterations it seems

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

pro cedure reactive tabu search

b egin

Initialise conguration

Initialise tabu list structures

rep eat

Find the decrease in function value for all p ossible elementary moves

Check for rep etitions

if achaotic rep etition is detected

Enter escap e mo de by executing a random number of elementary

moves without checking whether they are tabu or not

else

Find a move that is not tabu or that satises the aspiration criterion

if sucha move is found

Cho ose that move as the b est move

else

Find the b est of all moves indep endently of their tabu status

Decrease tabu list size to decrease the numberoftabumoves

end if

Make the chosen move tabu

Up date current and b est so far conguration time and function

value

end if

until a sucientnumb er of iterations have b een p erformed

end

Figure ReactiveTabu Search Basic algorithm

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Current State Best State

1.80

1.70 Error Span

1.60

1.50 0 500 1000 1500 2000

Iterations

Figure Evolution of the tabu search showing the escap e mechanism in action

25

20

15

10 Tabu List Size

5

0 0 500 1000 1500 2000

Iterations

Figure List size dynamics for the reactive tabu search

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

incapable however of escaping from the trap of a lo cal optimum This may be due

to the nature of the search space Battiti showed how redening the search space to

obtain a smo other target function leads to b etter results with the Tabu Search paradigm

Battiti Tecchioli

Greedy Algorithm

Greedy algorithms are employed in a large variety of applications including the LZW

data compression algorithm Nelson The largest advantage attained by using a

wing Greedy algorithm is sp eed this is b ecause the algorithm takes decisions by follo

the path that seems b est when the decision needs to be taken This normally implies

that the solution found is not the best p ossible but in many cases still approaches the

optimum

To minimise the Error Span of a Tunstall co de the b est matching of Hamming distance

between co dewords with Levenshtein distance between mapp ed strings is sought This

ensures that strings with a large Levenshtein distance between them are protected by

b eing assigned co dewords with a large Hamming distance b etween them Converselythis

also implies that the more probable errors in the enco ded stream pro ducing singlebit

small error when calculated with the Levenshtein errors in a co deword will result in a

distance The algorithm to match a large Levenshtein distance between strings with a

large Hamming distance b etween co dewords is shown in Fig

Illustration

As an illustration consider the optimisation of the Tunstall co de for source toy shown

in Table A with bit co dewords The Tunstall tree results in the following table with

four strings of source symb ols which need to b e assigned a co deword

Sequence String Probability

a

b

c

d

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

pro cedure greedy algorithm

b egin

Initialise co deword assignment table as all co dewords unassigned

Build a list of Hamming distance b etween all combinations of co dewords

Build a list of Levenshtein distance b etween all combinations of strings

Sort b oth lists in order of decreasing distance

rep eat

Consider the record from the top of the Levenshtein distance list

if both strings in the Levenshtein distance record are still unas

signed

Starting at the top of the Hamming distance list nd the rst record

with both co dewords unassigned

Assign the co dewords to the strings

else if only one string is still unassigned

Starting at the top of the Hamming distance list nd the rst record

with one co deword unassigned and the other co deword equal to

the one assigned to the string in the Levenshtein distance record

Assign the co deword to the string

end if

until all co dewords assigned

end

Figure Greedy Algorithm

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

The matrix of Levenshtein distance b etween strings for this co de is given b elow

a b c d

a

b

c

d

This matrix is then converted into a list in order of decreasing Levenshtein distance

Entries with distance zero may b e omitted from the list since they will b e of no use when

mapping co dewords to strings

Sequence Sequence Distance

a d

d a

a c

b d

c a

d b

a b

b a

b c

c b

c d

d c

Four co dewords need to b e assigned to the four strings thus a bit co deword will b e used

The matrix of Hamming distance b etween all combinations of co dewords is constructed

As for the matrix of Levenshtein distance a list is constructed in order of decreasing

Hamming distance Again entries with distance zero are left out of the list

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Co deword Co deword Distance

Starting from the top of the Levenshtein distance list we pro cess each entry as stated

in the algorithm

Both sequences a and d are unassigned the rst entry in the Hamming distance

list with b oth co dewords unassigned is Thus sequence a is assigned

co deword and sequence d is assigned co deword

The second entry is skipp ed since sequences d and a are now already assigned

Next consider sequences a and c sequence a is already assigned co deword but

sequence c is still unassigned now the rst entry in the Hamming distance list

with the rst co deword b eing and the second one unassigned is Thus

sequence c is now assigned co deword

Sequence b is still unassigned while sequence d is already mapp ed to co deword

consequently the rst entry in the Hamming distance list used now is

Sequence b is assigned co deword this completes the co deword assignment and

the algorithm is stopp ed

Thus we nish with the following co deword assignment

codeword string

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Implementation Details

The size of the lists required for the Greedy algorithm can b e relatively large It should

b e clear that the number of entries in the list is equal to the numb er of elements in the

resp ective matrix except for the diagonal elements which are always zero This means



eectively that for a co deword width of bits there are n n n entries in each list



where n In the current implementation the Levenshtein distance record needs

two bit integers to hold the tags for the two strings and a bit oating p oint space

holding the distance a oating p ointnumb er is used for a reason which will b e explained

in the next section This eectively means that each record needs bytes of storage

now for the symb ol English alphab et when the Tunstall tree is expanded for a bit



Thus in this case the Levenshtein distance list on its own co deword n



bytes or almost Mb of memory A similar requires

amount Mb is required for the Hamming distance list It should thus b e clear that

the Greedy algorithm as it stands is unusable for large co deword sizes

Improvements to the Basic Algorithm

It can be seen from Eqn that the Error Span for a co de also dep ends on the

probability of o ccurrence of co dewords This leads to a small mo dication in the Greedy

algorithm the matrix of Levenshtein distance is redened so that entry i j now

as b efore This should make the Greedy holds px Ldx x instead of Ld x x

i i j i j

algorithm more accurate in trying to minimise the Error Span In practice the only

change necessary in the algorithm is at the initialisation phase when the Levenshtein

distance list is constructed This includes storing the distance as a oating p ointnumber

rather than an integer for obvious reasons

For completeness the algorithm is further mo died so that the eect of the probability

is weighted This is done by replacing entry i j by

px Ldx x

i i j

Another mo dication p erformed allows the user to select whether to sort the two lists in

ascending or descending order This forces the algorithm to start allo cating co dewords

either by matching the largest Levenshtein distance with the largest Hamming distance

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

rst when the lists are sorted in descending order of distance or by matching the

smallest Levenshtein distance with the smallest Hamming distance rst The former

case denoted as Greedy High works on the idea that the larger errors need to be

protected b est while the latter denoted as Greedy Low assigns sequences giving the

lowest number of symb ol errors to co dewords at the smallest Hamming distance Hence

the more frequent singlebit errors in the channel will result in fewer symb ol errors

Comments

The Greedy algorithm is considerably faster than b oth the Simulated Annealing approach

and the Tabu Search heuristic However the qualityof the co des obtained is generally

slightly worse than the simple algorithms It should b e noted that the Greedy Algorithm



requires an amount of memory prop ortional to n where n is the num ber of codewords

This severely limits the usability of the algorithm to the simpler co des In practice a

bit co deword was found to exceed the capacity of the available machines

SemiExhaustive Search

The SemiExhaustive Searchisessentially an upgraded version of the Greedy algorithm

One of the asumptions taken in the Greedy algorithm that the rst highest Hamming

distance found in the case of ties is as go o d as any other may lead to poor results

This is avoided in the SemiExhaustiveSearch by considering at every step all entries

in the Hamming distance list with a distance equal to the rst valid entry found The

west Error Span is then used solution with the lo

The algorithm lends itself to a recursive implementation since at every stage one of

a number of entries with the same Hamming distance will be selected for allo cating

co dewords and all cases need to b e treated separately

Illustration

Consider the same source used to illustrate the Greedy algorithm setting in Eqn and sorting b oth lists in descending order

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

The Levenshtein distance list now b ecomes

Sequence Sequence Distance

a d

a c

d a

d b

a b

c a

b d

d c

c b

c d

b a

b c

The Hamming distance list remains as b efore The SemiExhaustive algorithm starts by

considering the rst entry in the Levenshtein distance list Since both strings a and d

are unassigned the rst valid entry in the Hamming distance list is Thus all

valid entries in the Hamming distance list with distance are considered ie the rst

four entries The next entry in the Levenshtein distance list is considered now string

a is already assigned so for every path available the rst valid entry in the Hamming

distance list is found In the rst path for example a so the rst valid Hamming

distance list entry is Thus all valid entries in the Hamming distance list

alid entry is with distance are considered In this case the only other v

All other paths are treated in a similar manner and the pro cess is rep eated until all

co dewords are assigned This is illustrated in Fig Finally the Error Span for the

co deword assignment at every leaf no de is calculated and the solution with the best

Error Span is selected

Note that even for such a simple co de the numb er of paths searched is relatively large

This makes the algorithm very slow up to the p oint of impracticality for large co des

Also note however that the numb er of paths to b e considered is signicantly less than

the set of all p ossible co deword assignments In this case where n the full set of

co deword assignments has n p ossible paths The SemiExhaustive algorithm

however only considers of these assignments

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Figure Co deword allo cations considered by the semiexhaustive algorithm

Comments

The SemiExhaustive algorithm by denition always pro duces co des which are at least

as good as those obtained with the Greedy algorithm In general it was noted that

the SemiExhaustive algorithm lowers the Error Span by ab out However the time

taken by the SemiExhaustive algorithm increases at least quadratically with the number

of co dewords combined with the large memory requirement of the algorithm slightly

more than the requirement of the Greedy algorithm this makes the SemiExhaustive

algorithm unsuitable even for bit co dewords

Comparison with Human Co ding

Human co ding Human is a lossless xedtovariable compression algorithm

and is often combined with other higherlevel compression It is a very p opular system

or transformation techniques such as JPEG LZ and other schemes Nelson

The error p erformance of Human co ding has already b een investigated by Maxted and

Robinson however the fo cus was not on the number of symb ol errors intro duced

but rather on the span of co dewords during which the deco der lost synchronisation

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Algorithm Code Rate Eciency

Human

Tunstall

Tunstall

Tunstall

Table Co de rates achievable on source pic

Algorithm Code Rate Eciency

Human

Tunstall

Tunstall

Tunstall

Table Co de rates achievable on source eng



The co de rates and co ding eciency achievable with Human and Tunstall co ding are



listed in Tables to for the pic and eng sources The eect of cho osing a longer

Tunstall co deword is also clearly visible the co ding eciency invariably increases as the

co deword length is increased It is also obvious however that the Tunstall algorithm

p erforms worse than Human co ding in terms of compression particularly with small

co deword lengths The increase of co deword length do es not havean adverse eect on

compression or decompression sp eed however it do es increase the memory requirements

drastically Also any algorithm for minimising the Error Span would b ecome signicantly

slower due to the considerably increased search space

In Human co ding it is p ossible for the deco der to lose co deword synchronisation This

have the same length and an error can cause a happ ens b ecause not all co dewords

co deword to be interpreted as another one of a dierent size Mizzi In Tunstall

co ding on the other hand all co dewords are of the same length which implies that

for a Binary Symmetric Channel the deco der can never lose co deword synchronisation

The average eect that this loss of synchronisation has on the error p erformance can

be measured using the Error Span For sources pic and eng Table shows the

minimum Error Span achieved using Tunstall and Human co ding It can b e seen that

with Tunstall co ding a considerably lower Error Span can be achieved at the exp ense



As dened in Eqn



Note that TunstallX denotes a Tunstall co de with co dewords of length X bits

CHAPTER ALGORITHMS TO MINIMISE ERROR SPAN

Error Span

Source Tunstall Human

pic

eng

Table Error Spans achievable with Tunstall and Human co ding

of compression Note however that the Human co de used in this case is not optimised

for error p erformance

Conclusions

In practice one would probably use one of the simple algorithms when sp eed is a ne

cessity This includes all cases where the source statistics are not xed as in a general

purp ose enco der or even more so in an adaptive enco der Either Simulated Annealing

or Tabu Search would be used with appropriate parameters for preparing a suitable

co de when the source statistics are known b eforehand and can b e assumed to b e xed

One such mo del may be the enco ding of an English source where the probabilities of

the dierent letters remain approximately constant In such a case the generation of a

co de only needs to b e done once subsequently all source co ding would b e done with the same co de

Chapter

Performance of Tunstall Co des in a

Binary Symmetric Channel

Co ding Gain for Tunstall Co des

Graphs of Symb ol Error Rate SER against the channels Signal to Noise Ratio SNR

were plotted for Tunstall co des with co dewords assigned by dierent algorithms They

were compared with curves for unco ded information passed through the same channel

to be able to assess the co ding gain for Tunstall co ded data when using the various

co deword assignment algorithms The results are shown in Figs to for the various

assignment algorithms the resp ective Error Span is shown in parentheses

Results indicate that most of the improvement in noise p erformance comes from the

compression factor of Tunstall co des In the examples shown the co de rate of pic was

while that of eng was The advantage gained by using a compressing co de

is achieved b ecause enco ded bits maybe transmitted with higher energy than unco ded

bits for the same data rate The optimisation of cho osing appropriate co dewords to

minimise Error Span only gives marginal improvement at high SNR values Its eect

at low SNR however is very go o d though one would normally not op erate in such

conditions



As dened in Eqn

CHAPTER PERFORMANCE OF TUNSTALL IN A BSC

100 Uncoded Random (2.575) Sequential (1.870) Gray Code (1.862) −1 Simulated Annealing (1.343) 10 Tabu Search (1.555) Greedy [High] (2.194) Greedy [Low] (1.972)

10−2 Symbol Error Rate

10−3

10−4 0.0 2.0 4.0 6.0 8.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Source pic with no error correction

10−2 Uncoded Random (2.575) Sequential (1.870) Gray Code (1.862) Simulated Annealing (1.343) Tabu Search (1.555) Greedy [High] (2.194) Greedy [Low] (1.972)

10−3 Symbol Error Rate

10−4 6.0 7.0 8.0 9.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Detail from Fig

CHAPTER PERFORMANCE OF TUNSTALL CODES IN A BSC

100 Uncoded Random (2.251) Sequential (1.537) Gray Code (1.680) −1 Simulated Annealing (1.452) 10 Tabu Search (1.375)

10−2 Symbol Error Rate

10−3

10−4 0.0 2.0 4.0 6.0 8.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Source eng with no error correction

Mathematical Mo del for Calculating Co ding Gain

A mathematical mo del was develop ed which gives the channels SNR required to transmit



at a given Symbol Error Rate This mo del was used in an attempt to work out the

asymptotic co ding gain b etween a random co deword assignment and the b est assignment

found for the particular source in a numerical analysis it resulted that co ding gain

tends to zero as the SNR increases This is not a promising result since it implies that

one cannot rely on minimising the Error Span of a co de in order to transmit at a lower

SNR However the minimisation of the Error Span still gives the advantage of a lower



eect of errors when one is transmitting through a Sup er Channel

WGN channel with BPSK For a Binary Symmetric Channel BSC derived from an A

mo dulation Proakis the crossover probability p is given by

s

E

b

R p Q

N



where R is the co de rate



This mo del works at low Symb ol Error Rate only



The Sup er Channel is dened as the combination of the transmission channel and the channel co de

CHAPTER PERFORMANCE OF TUNSTALL CODES IN A BSC

E

b

is the channels signal to noise ratio

N

The number of combinations of x bits errors in a bit co deword is given by

x x

x

Thus the probabilityofhaving x bit errors in a co deword is given by

x x

Px p q

x

where is the co deword length in bits

q p

As the SNR increases p This implies that

p x

Px

x

Thus as p the average number of symbol errors pro duced for every co deword

transmitted is given by

symb ol errorsco deword P

p

s

E

b

Q R

N



where is the Error Span

The average numb er of source symb ols enco ded p er co deword is given by

R

source symb olsco deword

where is the numb er of source symb ols

Hence the Symb ol Error Rate may b e found by

symb ol errorsco deword

E

source symb olsco deword

q

E

b

Q R

N

R

q

E

b

R Q

N

R

where E is the Symb ol Error Rate SER

CHAPTER PERFORMANCE OF TUNSTALL CODES IN A BSC

Thus for a given co de we can nd the SNR required to transmit at a given SER



E R E

b

Q

N R



Note that if the same Tunstall tree is assigned with two dierent sets of co dewords b oth

R and remain unchanged The only eect is a change in Thus the co ding gain in

dB at a given SER b etween two assignments having Error Span values of and



where can b e found by



E R

Q

E E

b b



A

log log log

  

E R

N N

 

 

Q

 



To assess the validity of the mathematical mo del the channel SNR required for a given

SER was computed using Eqn for source pic with and co dewords

assigned using the Simulated Annealing algorithm The results obtained were then com

pared with simulation results for the same Tunstall co de as shown in Fig It can b e

seen that the mo del is valid even for relatively high SER Note that Eqn cannot

b e solved analytically due to the dicultyinworking Q x and thus had to b e solved



numerically

The co ding gain b etween Tunstall co des using the reference assignment and



the b est assignment pro duced with the Simulated Annealing algorithm was

worked out using Eqn at various values of SER The resulting values of co ding

gain were plotted as sho wn in Fig Similarly the co ding gain b etween the Tunstall

co de with the best Error Span and an uncompressed co de was plotted as shown in

Fig This was achieved by using the same formula for SER and setting R and

for the uncompressed co de

Comparison with Human Co ding

To be able to compare the p erformance of Human and Tunstall algorithms in noise

the same sources used in Section were compressed using Human co ding and passed

through a Binary Symmetric Channel Figs and show the results obtained for

the Human algorithm as compared to a Tunstall algorithm with the lowest Error Span



The secantmethodwas used to nd the ro ots

CHAPTER PERFORMANCE OF TUNSTALL CODES IN A BSC

10.0

Mathematical Model Simulated 8.0

6.0

4.0

Signal to Noise Ratio, Eb/No (dB) 2.0

0.0 10−6 10−5 10−4 10−3 10−2 10−1 100

Symbol Error Rate

Figure Comparison b etween mathematical mo del and simulation results

0.5

0.4

0.3

0.2 Coding Gain (dB)

0.1

0.0 10−30 10−25 10−20 10−15 10−10 10−5 100

Symbol Error Rate

Figure Co ding gain b etween Tunstall co des with dierent co deword assignments

CHAPTER PERFORMANCE OF TUNSTALL CODES IN A BSC

1.150

1.145

1.140

1.135 Coding Gain (dB)

1.130

1.125 10−30 10−25 10−20 10−15 10−10 10−5 100

Symbol Error Rate

Figure Co ding gain b etweenaTunstall co de and an uncompressed co de

achieved and the shortest co deword length It can b e noted that the Human algorithm

tends to give b etter results at lower symb ol error rates This p erformance improvement

over Tunstall co ding is probably mostly due to the higher compression provided by

Human co ding

To remove the eect that the co de rate has on the p erceived error p erformance the

symb ol error rate was plotted against the channel bit error rate as shown in Figs and

When plotting graphs of SER against SNR it was assumed that the transmitted

data rate remained the same when dierent co ding schemes were used This implies that

in a co ding scheme which has a higher co de rate the same data rate would require a

lower this translates in to a longer time for transmitting a single bit and thus

obtaining a better signaltonoise ratio In this case on the other hand it is assumed

that the channel bit rate remains the same This is usually the case where the channel

mo dulation scheme is set by a standard and includes most application areas such as

Lo cal Area Networks and Telephony

CHAPTER PERFORMANCE OF TUNSTALL CODES IN A BSC

100 Uncoded Tunstall−8 Huffman

10−1

10−2 Symbol Error Rate

10−3

10−4 0.0 2.0 4.0 6.0 8.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Comparison with Human co ding for source pic with no error correction

100 Uncoded Tunstall−10 Huffman

10−1

10−2 Symbol Error Rate

10−3

10−4 0.0 2.0 4.0 6.0 8.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Comparison with Human co ding for source eng with no error correction

CHAPTER PERFORMANCE OF TUNSTALL CODES IN A BSC

10−1

10−2

−3 Symbol Error Rate 10

Uncoded Random (2.575) Simulated Annealing (1.343) Huffman

10−4 10−5 10−4 10−3 10−2

Crossover Probability (p)

Figure Eect of BER on SER for source pic with no error correction

10−1

10−2

−3 Symbol Error Rate 10

Uncoded Random (2.251) Tabu Search (1.375) Huffman

10−4 10−5 10−4 10−3 10−2

Crossover Probability (p)

Figure Eect of BER on SER for source eng with no error correction

CHAPTER PERFORMANCE OF TUNSTALL CODES IN A BSC

In cases where the channel bit rate remains the same Tunstall co ding tends to p erform

b etter than Human co ding In particular when an appropriate co deword assignment

algorithm is used so as to minimise the Error Span the Tunstall co des error p erformance

is only marginally worse than the Uncompressed case Consider a Tunstall co de with

Error Span and co de rate R where the co deword length is bits and the original

source had symb ols where is an integer In general if a single bit error o ccurs

l

in a compressed message of l bits then on average there will be symbol errors in

R

source symb ols thus the co dewords Now every co deword represents an average of

lR

deco ded message length is symb ols If a message of length l bits were to b e transmitted

uncompressed a single bit error would result in a single symbol error However the

l

message length in symb ols is now shorter b eing symbols The symbol error rate for

while in the uncompressed case it is equal to the Tunstall co ded message is thus

lR l

This implies that when a Tunstall co de is used the SER increases asymptotically for

high SNR by a factor of when compared with an uncompressed system This leads

R

to a hyp othesis that if there exists a Tunstall co de where the Error Span is less than

the co de rate then by using a Tunstall co de while transmitting at the same channel bit

rate the symbol error rate is reduced It must b e noted however that sucha Tunstall

ratio achieved was with source pic co de has not b een encountered yet the smallest

R

where an bit Tunstall co de achieved a co de rate of and the Error Span was

reduced using Simulated Annealing to a value of In fact the theoretical increase

in SER is in this case so small an increase of only that the curves for the Tunstall

co de and for the unco ded case are practically lying one on top of the other This can b e

seen in Fig note that a very long message was necessary in order to achieve a high

accuracy in the measurement of the SER

Chapter

Use of Error Correction

Intro duction

In most mo dern communication systems the channel is usually equipp ed with a channel

enco der on one end and a corresp onding deco der on the other This form of channel

co ding usually makes use of Error Correcting Co des in order to be able to transmit at

a lower SNR while maintaining the same Symbol Error Rate Typically the channel

co ding is indep endent of the data b eing transmitted and is matched to the channels

characteristics Any source co ding b eing p erformed suchasTunstall co ding in this case

is a separate unit and the typ e of co ding dep ends heavily on the data source

Indep endent Channel Co ding

During the course of this pro ject aBCHchannel co ding system Lin Costello

was employed in order to analyse its eect on dierent Tunstall co des particularly on

dierent co deword assignments for the same source This was also compared with the

unco ded case and the case where only channel co ding was p erformed no compression

The BCH co ding system was adopted b ecause it is a very p opular and relatively simple

system and was readily available Vella Various co des were used with dierent

blo ck length and error correcting capabilities to be able to assess the eect of the

BCH co des parameters Fig shows Tunstall co des obtained with various co deword

CHAPTER USE OF ERROR CORRECTION

100

10−1

10−2

Uncoded Uncompressed

Symbol Error Rate Random (2.575) Sequential (1.870) 10−3 Gray Code (1.862) Simulated Annealing (1.343) Tabu Search (1.555) Greedy [High] (2.194) Greedy [Low] (1.972)

10−4 0.0 2.0 4.0 6.0 8.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Source pic protected with BCH singleerrorcorrecting co de

assignment algorithms protected using the singleerror correcting BCH co de for

source pic Fig shows the same source protected with a more powerful double

error correcting BCH co de

In these gures the unco ded curve represents the p erformance of a simple xedlength

co de for the source while the uncompressed curve represents the p erformance of this

simple xedlength co de protected with the BCH co de

One can observe that when BCH co des are used there is almost no dierence in the

p erformance of dierent co deword assignments for the same Tunstall tree BCH co des

have the prop erty of signicantly reducing the bit error rate for relatively high SNR on

the channel

The curves for source eng shown in Fig which has a lower co de rate than pic

tover the uncompressed case is only marginal indicate that in this case the improvemen

This further conrms the hyp othesis that any co ding gain is only due to the compression

CHAPTER USE OF ERROR CORRECTION Uncoded Uncompressed Random (2.575) Sequential (1.870) Gray Code (1.862) Simulated Annealing (1.343) Tabu Search (1.555) Greedy [High] (2.194) Greedy [Low] (1.972) Signal to Noise Ratio, Eb/No (dB) 0.0 2.0 4.0 6.0 8.0 0 −1 −2 −3 −4

10

10 10 10 10

Symbol Error Rate Error Symbol

Figure Source pic protected with BCH dualerrorcorrecting co de

100 Uncoded Uncompressed Random (2.251) Sequential (1.537) −1 Gray Code (1.680) 10 Simulated Annealing (1.452) Tabu Search (1.375)

10−2 Symbol Error Rate

10−3

10−4 0.0 2.0 4.0 6.0 8.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Source eng protected with BCH singleerrorcorrecting co de

CHAPTER USE OF ERROR CORRECTION

100 Uncoded Uncompressed Tunstall−8 Huffman 10−1

10−2 Symbol Error Rate

10−3

10−4 0.0 2.0 4.0 6.0 8.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Comparison with Human co ding for source pic protected with BCH

singleerrorcorrecting co de

Comparison with Human Co ding

It has already b een discussed in Section why Human co ded sources p erform b etter

than Tunstall co ded sources at high channel SNR The same phenomenon seems to

happ en even when the message is protected using a generic Forward Error Correction

FEC co de Figs and show how the best two Tunstall co des compare with a

Human co ded version of sources pic and eng when both are protected with the

same BCH co de

It is evident that Human co ding again p erforms b etter than Tunstall co ding the reason

is most probably due to the signicantly higher co ding eciency of the Human co ding

system The curves obtained agree with the previous hyp otheses the eect of a BCH

co de protecting the system is merely to op erate the source deco der at a signicantly

reduced bit error rate It has already b een seen that Human co ding gives b etter results

at high channel SNR presumably b ecause the eect of co de rate dominates over the

eect of Error Span Thus it is fairly natural to exp ect a Human co ded system to give

CHAPTER USE OF ERROR CORRECTION

100 Uncoded Uncompressed Tunstall−10 Huffman 10−1

10−2 Symbol Error Rate

10−3

10−4 0.0 2.0 4.0 6.0 8.0 10.0

Signal to Noise Ratio, Eb/No (dB)

Figure Comparison with Human co ding for source eng protected with BCH

singleerrorcorrecting co de

b etter results than a Tunstall co ded system when a BCH co de is used to protect the

source co ded message from noise

The Symb ol Error Rate was plotted against the channel crossover probability in Fig

for Human and Tunstall co des The curves obtained show how the Tunstall co de

withalow Error Span is closer to the curve for an uncompressed co de In this case the

uncompressed and simulated annealing curves almost coincide The same reasoning used

in Section can b e used to explain this result It is worth noting that any improvement

over the unco ded system as seen in Fig is due to the BCH co de alone However the

deterioration caused by the source co der is minimal b oth for the Tunstall and Human

schemes

CHAPTER USE OF ERROR CORRECTION

100 Uncompressed Random (2.575) Simulated Annealing (1.343) Huffman Uncoded 10−1

10−2 Symbol Error Rate

10−3

10−4 10−4 10−3 10−2 10−1

Crossover Probability (p)

Figure Eect of BER on SER for source pic protected with BCH single

errorcorrecting co de

Optimised Channel Co ding

The Tunstall co ding technique exhibits certain prop erties whicharevery dierentfrom

more common xedtovariable length source co ding metho ds This suggests the use

of a channel co ding technique optimised for a source already enco ded with an appro

priate Tunstall co de Particularly the p ossibilityof using some form of Unequal Error

Protection UEP was considered

Unequal Protection of Bits Within a Co deword

Initially the p ossibility of using UEP where certain bits of the co deword are more pro

tected than others was considered Masnick Wolf For this reason the contri

bution of each bit p osition to the Error Span for various Tunstall co des was investigated

CHAPTER USE OF ERROR CORRECTION

2.0 Random Sequential Gray Code Greedy [High] Greedy [Low] Tabu Search Simulated Annealing 1.5

1.0 Contribution (Ratio of expected)

0.5 012345678

Bit Position in Codeword

Figure Contribution of dierent bits in the co deword to the Error Span pic

The contribution of bit i within a co deword is dened by

n

X

px Ldx x

i

k k

k 

n

k 

i

Esx

where x is the string mapp ed to co deword k represented in binary

k

px is the probability of o ccurrence of string x

k k

denotes the bitwise logicalXOR op eration

Note that i if all the bits in the co deword are equally imp ortant

i

The contributions of all bits were calculated for Tunstall co des using various co deword

assignment algorithms Typical results are shown in Fig in which source pic was

used with bit co dewords and in Fig where source eng was used with bit

co dewords

While certain co deword assignmenttechniques notably the Sequential and GrayCode

assignments exhibited a consistent dierence in imp ortance b etween bits other assign

co de where the bits of a co deword are of practically equal ment metho ds resulted in a

CHAPTER USE OF ERROR CORRECTION

Random Sequential Gray Code Tabu Search 1.4 Simulated Annealing

1.2

1.0

Contribution (Ratio of expected) 0.8

0.6 012345678910

Bit Position in Codeword

Figure Contribution of dierent bits in the co deword to the Error Span eng

imp ortance Particularly assignment metho ds based on the use of random sequences

tend to pro duce co des with equally imp ortant bits These do not only include the refer

ence Random assignment but also suchtechniques as Simulated Annealing which when

correctly tuned pro duces very go o d co des The Tabu Search metho d deserves a sp ecial

mention here since although it uses a deterministic approach its results dep end heavily

on the starting conditions initial co de assignment if started with a Random assign

ment the resulting co de tends to have equally imp ortant bits while when started with a

Gray Co de the dierence of imp ortance b etween bits is notable Note that for the Tabu

Search curve of Fig the searchwas started from a Gray Co de assignment

In co des where some bits are signicantly more imp ortant than others those bits having

a larger contribution to the Error Span can be protected more than the rest with the

use of appropriate UEP co des Masnick Wolf

CHAPTER USE OF ERROR CORRECTION

10.0 Sequential Gray Code Greedy [High] Greedy [Low] 8.0 Tabu Search Simulated Annealing

6.0

4.0

Contribution (Ratio of expected) 2.0

0.0 0 64 128 192 256

Codeword (Sorted for Decreasing Contribution)

Figure Contribution of dierentcodewords to the Error Span pic

Unequal Protection of Co dewords

Another p ossibility b eing considered is the use of UEP co des where certain codewords

rather than bits are protected more than others Bernard Sharma Accordingly

the contribution of each co deword to the Error Span was calculated for various Tunstall

co des The contribution of co deword i is dened by

X

px Ld x x

j

i i

i

j 

i

Esx

where x is the string mapp ed to co deword k represented in binary

k

px is the probability of o ccurrence of string x

k k

denotes the bitwise logicalXOR op eration

Typical results are shown in Fig for source pic with bit co dewords and in

Fig for source eng with bit co dewords using various co deword assignment

algorithms As can be seen the results in this case are more promising since all co des

considered manifest a remarkable dierence between the imp ortance of co dewords with

CHAPTER USE OF ERROR CORRECTION

12.0 Sequential Gray Code Tabu Search 10.0 Simulated Annealing

8.0

6.0

4.0 Contribution (Ratio of expected) 2.0

0.0 0 256 512 768 1024

Codeword (Sorted for Decreasing Contribution)

Figure Contribution of dierentcodewords to the Error Span eng

resp ect to error p erformance The co dewords having a larger contribution to the Er

ror Span can be protected more than other co dewords with the use of UEP co des

Bernard Sharma

The results of the Random assignmentin this case have not b een plotted it is mean

ingless to work out an average of the contribution to the Error Span of co deword i with

dierent random assignments This is b ecause there seems to b e no correlation b etween

co deword value and its contribution for any of the assignment algorithms considered

Note that in Fig it is always the highorder bits that contribute more to the Error

Span with the Sequential algorithm while in Fig the co dewords are sorted in order

of decreasing contribution b efore plotting

Chapter

Conclusions

Tunstall Co dec

The two ma jor restrictions to the general applicability of the Tunstall co dec used through

out the pro ject are that the Tunstall co de must b e complete and that the source should

b e memoryless A further minor restriction is that the source message should b e fully

enco dable

Diculties of Having an Incomplete Tunstall Co de

To obtain a complete Tunstall co de it was shown in Section that the source should

have symb ols where is an integer A complete Tunstall co de has the advantage

that any co deword received is valid simplifying the deco ding pro cess

When an incomplete Tunstall co de is used on the other hand there is always the p ossi

bilty that an error along the channel will result in an undened co deword Possible ways

of dealing with such cases include

The undened co deword j is mapp ed to the same symbol sequence asso ciated

with co deword i which is the most probable co deword with the smallest Hamming

co dewords are used from a co deb o ok of n distance from j In this case only n

co dewords where is the co deword length This results in a fast assignment and

may b e used in conjunction with the sequential or gray co de assignments

CHAPTER CONCLUSIONS

A number of co dewords are assigned to the same symbol sequence These co de

words and the corresp onding symb ol sequences must b e chosen so as to minimise

the Error Span and hence the algorithm is much more complicated

The sequence mapp ed to the undened co deword may be any sequence of source

symb ols and not necessarily one in the Tunstall tree This represents a further

level of complexity and is p ossibly the b est and probably slowest way of dealing

with the issue of undened co dewords

Sources with Memory

The Tunstall co dec used within the pro ject assumes a memoryless source and would

not op erate at maximum co ding eciency on a source with memory This may seem to

of the Tunstall algorithm when compared with xedtovariable be a ma jor deciency

length compression schemes Human co ding for example is not adversely aected by

a source with memory However one of the ma jor advantages of variabletoxed length

enco ding schemes is that suchschemes can make use of the sources memory to increase

the compression rate In fact it is this particular feature that has made the Lemp elZiv

algorithm and its derivatives so p opular in reallife applications Nelson The issue

of sources with memory with Tunstall co ding has b een analysed bySavari and Gallager

Sources which Cannot be Fully Enco ded

It has b een shown in Section that in some source messages after enco ding as

many source symb ols as p ossible a number of symb ols remain which do not map to any

co deword One way of solving the problem was suggested in which the message length

is transmitted with the enco ded data This solution is suitable for systems employing

ve disastrous consequences if an error o ccurs in the packet packet transmission but can ha length

CHAPTER CONCLUSIONS

Adaptive Compression

Many p opular mo dern compression systems are capable of adapting their parameters to

suit the changing statistics of the source message Such a feature would p ossibly increase

the compression rate at the exp ense of increased co der and deco der complexity In this

pro ject adaptive compression was not considered since the source statistics are xed

throughout a message

Minimising Error Span

The algorithms which obtain the best Error Span for a Tunstall co de tend to be very

resource intensive The fast co de assignment algorithms while p erforming signicantly

b etter than the reference random assignment are much less eective than the elab orate

algorithms in reducing the Error Span In practice the simple algorithms are the only

ones that can b e used in adaptive compression and in any other case where the source

statistics are not xed In certain cases however an approximation to the actual source

statistics is known For example in certain standards for digital video transmission the

statistics for the Discrete Cosine Transform DCT co ecents are xed and will b e the

h cases if Tunstall co ding is to b e same in all digital video clips in this standard In suc

employed the complexity of the co deword assignment algorithm is of no consequence

Thus elab orate algorithms can safely be used to achieve the b est error p erformance

p ossible

Comparison with Human Co ding

When transmitted over a Binary Symmetric Channel Human co ding has a better

error p erformance than Tunstall co ding at high channel SNR This is due to the higher

compression rate achievable by Human co ding at a given complexity In some cases

however the bit rate of the channel cannot be varied These include transmission over

standard telecommunication networks where the bit rate used is xed by the standard

ve a b etter p erformance than Human In such cases Tunstall co ding has b een shown to ha

co ding In practice with a low Error Span the Tunstall co de is only marginally worse

CHAPTER CONCLUSIONS

than the uncompressed case It was hyp othesised that if there were a Tunstall co de with

an Error Span less than the co de rate the SER for such a co de would b e less than that

for an uncompressed co de at a given channel crossover probability However no such

co de has yet b een found

Use of Error Control Co ding

With the use of conventional Forward Error Correction FEC co des to protect the

source co ded message the Tunstall co de behaves worse than Human co ding This is

only b ecause Human co ding has a better p erformance at low channel bit error rates

ely lowers the bit error rate p erceived by the source and the BCH co de used eectiv

deco der For the same reason the dierence in p erformance b etween dierentcodeword

assignments for the same Tunstall tree is muchlower than when no channel co ding was

used Toimprove the p erformance of Tunstall co des the use of an FEC system matched

to the prop erties of a Tunstall co de should b e considered

Optimised Error Protection for Tunstall Co des

In order to determine if it would b e protable to make use of Unequal Error Protection

UEP schemes the relevant prop erties of Tunstall co des were analysed in Section

In particular the relative contribution to the Error Span of dierent bits in a co deword

was analysed This has shown that in most of the b etter co deword assignments dierent

bits in a co deword are almost equally imp ortant Thus they would not b enet from the

use of UEP On the other hand when the imp ortance of co dewords within the co deb o ok

few of the co dewords was analysed it was seen that in all co deword assignments a

were resp onsible for most of the Error Span This implies that this form of UEP can

be fruitfully used for most Tunstall co des Error Correcting Co des which have the

capability of correcting dierent errors in dierentcodewords were discussed by Bernard

and Sharma and should b e suitable for the protection of Tunstall co des

App endix A

Source Statistics

Various dierent sources are used having a dierent number of symb ols or a dierent

probability prole For each source it is assumed that the symb ols comprising the

message are completely indep endent In simulations this is achieved by using a random

generator to supply the source message and byaveraging the results obtained for dierent

messages with the same probability distribution The probabilities for each symbol of

the dierent sources are given in Tables A to A

Besides the two simple test sources given initially two other sources were tried whose

statistics are derived from real applications The raster of a colour handdrawn image

statistics shown in Table A The actual image has rather a lot of resulted in the

redundancy not only b ecause some colours app ear very often but also b ecause of the

spatial correlation typical of images The Tunstall co de naturally only makes use of

the former redundancy Another more imp ortant source considered was a symbol

English source The statistics as shown in Table A are derived from the full text of

Sir Walter Scotts Ivanhoe which had rst b een converted to lowercase

APPENDIX A SOURCE STATISTICS

i ps

i

Table A Source statistics for source toy

i ps

i

Table A Source statistics for source toy

i ps

i

Table A Source statistics for colour image pic

APPENDIX A SOURCE STATISTICS

i symbol ps

i

a

b

c

d

e

f

g

h

i

j

k

l

m

n

o

p

q

r

s

t

u

v

w

x

y

z

sp c

par

Table A Source statistics for symb ol English source eng

App endix B

Program Do cumentation

The programs used throughout the pro ject have been built in ANSI C with the use of

standard ANSI and BSD libraries The software has b een designed with the following in

mind

Mo dularity all functions should b e compact and selfcontained Global variables

should only b e used where strictly necessaryandeven in that case access to the

variable should only be made within the mo dule to whichit b elongs Any access

from other mo dules should be hidden b ehind functions provided by the mo dule

where the variable b elongs

Reusability as a consequence of the mo dular design program ob jects functions

and pro cedures should b e suitable for inclusion into a library such that an enduser

program can easily b e built by concatenating a set of ob jects from the library

Portability the whole suite should be easily p ortable to all UNIX environments

with only minor changes in the conguration header to allow for compiler pecu

liarities and even to most other system having an ANSI C compiler Only some

of the more esoteric enduser programs are allowed to be slightly less p ortable

so that certain useful features of the op erating system can be utilised A p erfect

example which has b een used in practice is program forking whereby the p erfor

mance of one of the slower programs was increased by allowing the use of multiple

pro cessors

APPENDIX B PROGRAM DOCUMENTATION

B EndUser Programs

All programs accept input from standard input stdin and pro duce their ouput on

standard output stdout Optionally a le may be sp ecied for input and another

le for output In the case of input from a le when endofle EOF is reached the

program automatically starts exp ecting input from stdin which may be the console or

else redirected to a le or pip e This allows the initial parameters to be supplied from

a le and further parameters to be queried from the user The most common example

is the case where the source statistics are supplied from a le picsourcesay and

all subsequent parameters taken from another le seq using redirection

picsource seq mkcode

All debugging output is sent to standard error stderr and is generally useful to know

that the program is functioning correctly It is useful to keep the stderr output as a log

le since all parameters passed to the program are also recorded here

To assist the user if an incorrect number of parameters is passed all programs will

output a brief message detailing what commandline parameters the program exp ects

This message is also available if the program is called with a single parameter h

The programs available to the enduser can b e split up into three categories which will

b e examined separately

Library Functions

Simulation Controllers

UtilityFunctions

B Library Functions

A substantial numb er of programs provide relatively lowlevel and mo dular access to the

pro cess of transmitting and receiving Tunstall co ded message These include functions

such as building a Tunstall tree assigning co dewords enco ding deco ding comparing

messages calculating Lev enshtein distance creating random source messages adding

APPENDIX B PROGRAM DOCUMENTATION

random noise to messages and displaying messages As such most of these programs are

simply wrapp edup versions of the equivalent library function Required commandline

parameters are given in parentheses

mkco de accepts the source statistics the number of source symbols followed by the

probabilityofeachsymb ol and the parameters necessary to build a Tunstall tree

for that source and to assign the co dewords These are the co deword length re

quired and the co deword assignment algorithm number followed by the list of

parameters sp ecic to that co deword assignment algorithm The algorithms avail

able are

n where n random assignment with seed n

sequential algorithm

y co de algorithm gra

simulated annealing algorithm

tabu search algorithm

semiexhaustive algorithm optimised for high bit error rates

semiexhaustive algorithm optimised for low bit error rates

greedy algorithm optimised for high bit error rates

greedy algorithm optimised for low bit error rates

The program then outputs the source statistics followed by the co deword length

and the whole co deb o ok This is all that is necessary to recreate the same Tunstall

co de and the output le is in a format suitable for redirecting into other programs

which require this information At the same time the output is humanreadable

in that the co deb o ok includes both the decimal and binary representation of the

co deword as well as the string of source symb ols in binary that it represents

The program also prints the relative imp ortance of bits within a co deword and

optionally the relative imp ortance of co dewords within the co deb o ok

f symb ols The random generator mksource f seed creates a random source of length

is initialised with seed which should b e a small number The source statistics are

APPENDIX B PROGRAM DOCUMENTATION

read from stdin these include the number of symbols in the source followed by

the probability of eachsymb ol

enco de co dele uses the Tunstall co de sp ecied in co dele and enco des stdin to

stdout The Tunstall co de le is usually the output of mkcode while the input

stream is usually generated by mksource

deco de co dele uses the Tunstall co de sp ecied in co dele and deco des stdin to

stdout The Tunstall co de le is usually the output of mkcode while the input

y encode and p ossibly corrupted by addnoise stream is usually generated b

addnoise p seed corrupts the source stdin the channel mo del is a BSC with bit

crossover probability p The random generator is initialised with seed which should

b e a small numb er

cmp dec source deco ded calculates the Levenshtein distance b etween messages source

and decoded The costs for insertion deletion and substitution are read from

stdin An error is pro duced if the source and deco ded messages do not have the

same number of symbols

cmp enc transmitted received calculates an approximation to the Levenshtein dis

between the source and deco ded message where transmitted is the Tun tance

stall co ded equivalent of the source message and received is the Tunstall co ded

equivalent of the deco ded message Again the costs for insertion deletion and

substitution are read from stdin

cmp enc transmitted received calculates an improved approximation to the Leven

shtein distance between the source and deco ded message All parameters are

treated in the same wayascmpenc

B Simulation Controllers

While the library functions can be used to generate all form of necessary results the

manual rep etitive use of such programs b ecomes tedious Simulation controllers are

automated programs which rep etitively call the necessary library functions directly

not through a separate executable and p erform a number of mathematical op erations

APPENDIX B PROGRAM DOCUMENTATION

on the results obtained Finally the results are output in a useful format for further

pro cessing or more usually for visualisation The simulation controllers allow several

complex analyses to b e p erformed and provide an easy way to present the results using

graphs with an external program

analyse will accept through stdin a list of co deles with sets separated by a blank line

to pro cess For every co dele the relative imp ortance of bits within a co deword

and that of co dewords within the co deb o ok is calculated If a set contains more

than one co dele the calculated values are averaged for the whole set The calcu

lated values are then output to stdout or to the sp ecied les to allow for easier

separation of the imp ortance of co dewords from the imp ortance of bits as an array

The output values are the ratio of the imp ortance of a given co deword or bit and

the exp ected imp ortance if all bitsco dewords were equally imp ortant

from stdin the number of source symb ols and the randist reads the source statistics

probabilities of each symb ol followed by the co deword length in bits and the

numb er of random assignments to try The Error Span for every random assignment

tried is then output to stdout These values maybe used to plot the distribution

of Error Span for randomly assigned co dewords

simbch is used to plot a curve of SER against SNR or against p for any Tunstall

co de using any BCH protection co de The parameters are read from stdin and

include the co de statistics number of source symb ols probability of each source

sym b ol co deword length and co deb o ok and acount which indicates howmany

simulations will b e p erformed b efore an average is taken Other parameters which

control the curve pro duced include a b o olean to indicate whether or not the variance

of results is printed a lower limit for the SER required a message length multiplier

usually but can b e used to force the simulation to run on longer les a limit

to the le size in kbytes and the costs of insertion deletion and substitution

Finally the power of the BCH co de m and the numb er of correctable errors t

are entered These decide the BCH co de to b e used The maximum le length is

used to regulate the length of simulation les When the maximum is exceeded

the simulation run is shortened but more simulations are p erformed to ensure

APPENDIX B PROGRAM DOCUMENTATION

adequate results The BCH co de used NKt is determined by

m

N

K N m

The program requires the presence of a le gfmdat in the current directory

where m is the BCH co des p ower

simbchc is the same as simbch except that the BCH enco der BSC channel and

BerlekampMassey deco der are implemented in a separate program bmdec This

had to be done b ecause of some incompatibility between the BCH and Tunstall

libraries which did not show up during compilation simbch cannot be used as it

is but simbchc is a direct substitute assuming that bmdec is also in the current

directory or path

simbchv is identical to simbchc except that the BCH deco der uses the Viterbi algo

rithm and the resp ective executable is vitdec

simgain reads the co de statistics for twocodesfrom stdin The two co des must dier

only in the co deword assignment source statistics and co deword length must

be the same for b oth This ensures that for these co des only the Error Span is

dierent Using the mathematical mo del discussed in Section the co ding gain

is worked out numerically at dierent SER values and output to stdout

hannel crossover probabil simleven is used to work out the Error Increase at dierentc

ity p using the standard Levenshtein distance algorithm as well as the simplied

and improved algorithms The resulting table output to stdout can be used to

compare the accuracy of the approximate algorithms The program reads the co de

statistics from stdin followed by a count which denotes howmanysimulations will

b e p erformed at every value of p This is followed by b o oleans which tell whether

the variance of the results and the dierence b etween results should b e plotted and

byavalue denoting the smallest bit error rate required on the channel A message

length multiplier allows the user to increase the simulation length and is followed

by the costs of a substitution insertion and deletion An optional commandline

parameter gives the name of a statesaving le such that the algorithm will con

tinue from a previously stopp ed execution This feature had to b e included in the

APPENDIX B PROGRAM DOCUMENTATION

program b ecause the simulations takevery long in order of days on a pro cessor

SPARCstation and any machine crash would otherwise necessitate restarting

from scratch

simlevf accepts the same parameters as simleven except for the statesaving parame

ter but in this case the Levenshtein distance is computed using only the improved

approximation This allows low bit error rates to be simulated in a decent time

such that the Error Increase can b e compared with the Error Span

simser will p erform the same op eration as simbch except for the channel enco der All

symb ols are transmitted directly without any BCH errorcorrecting co de thus the

parameters p ertaining to the BCH co des characteristics are not required

B Utility Functions

During the course of the pro ject certain le op erations were required which are not

provided by the op erating system Such programs do not usually make use of the library

routines but only p erform op erations on les which are otherwise unattainable

the le inle into the largest number of bitpacked mkstats mbits inle will dissect

symb ols with mbits bits each and will output the frequency of each symbol to

stdout in a format which can be read by cvtstats This program was used to

calculate the statistics for sources pic and eng

cvtstats reads from stdin the numb er of source symb ols followed by a lab el and count

for each symbol This may be entered manually or else may be the output of

mkstats The program outputs the source statistics on stdout in a format suitable

for use by all the other programs which require source statistics

mkenglish reads the source statistics from stdin and if the source has symb ols it

will output the statistics for a symb ol english alphab et on stdout This program

as ASCI I enco ded assumes that the source w

display accepts a bitpacked message from stdin and displays it on stdout as a train

of symb ols in binary representation separated by a period The message has the

APPENDIX B PROGRAM DOCUMENTATION

same format as used in all other programs which handle messages and includes

the numb er of symb ols in the alphab et followed by the numberofsymb ols in the

message as binary bit numb ers in the header This is followed by the message

body in bitpacked format rounded to a word bit b oundary

simbchcvt simlevcvt are used to convert data les from old simulations necessi

tated by a change in certain denitions They are thus unnecessary for any new

simulations but have b een kept for reference

Bibliography

Battiti Roberto Tecchioli Giampietro The Reactive Tabu Search

ORSA Journal on Computingvol no pp

Bernard Margaret Ann Sharma Bhu Dev Linear Co des with Non

Uniform Error Correcting Capability To b e published

Bland J A Baylis D J A Tabu Search Approach to the Minimum

Distance of ErrorCorrecting Co des International Journal of Electronics vol

no pp

el Gamal Abbas A Hemachandra Lane A Shperling Itzhak Wei

Victor K Using Simulated Annealing to Design Go o d Co des IEEE Transac

tions on vol pp Jan

Farrell P G Notes on Source Co ding Feb University of Manchester

Huffman D A A Metho d for the Construction of MinimumRedundancy

Co des Preceedings of the Institute of Radio Engineers vol pp

Sept

Kernighan Brian W Ritchie Dennis M The C Programming Language

Prentice Hall International second edn

Kruskal Joseph B An Overview of Sequence Comparison Time Warps String

Edits and Macromolecules SIAM Reviewvol pp Apr

Lin Shu Costello Jr Daniel J Error Control Coding Fundamentals

and Applications Prentice Hall International

BIBLIOGRAPHY

Masnick Burt Wolf Jack On Linear Unequal Error Protection Co des

IEEE Transactions on Information Theoryvol pp Oct

Maxted James C Robinson John P Error Recovery for Variable Length

Co des IEEE Transacions on Information Theoryvol pp Nov

Mizzi John D SelfSynchronisation Properties of Variable Length Codes

BEngHons nal year pro ject University of Malta Department of Communi

cations and Computer Engineering Faculty of Engineering

Muxiang Zhang Fulong Ma Simulated Annealing Approach to the Min

imum Distance of ErrorCorrecting Co des International Journal of Electronics

vol no pp

ession Book MT Bo oks Nelson Mark The Data Compr

Pancha P el Zarki M MPEG Co ding for Variable Bit Rate Video Trans

mission IEEE Communications Magazinevol pp May

Press William H Teukolsky Saul A Vetterling William T

Flannery Brian P Numerical Recipes in C The Art of Scientic Computing

Cambridge Universty Press second edn

Proakis John G Digital Communications McGraw Hill International third

edn

Savari Serap A Gallager Robert GVariabletoFixed Length Co des

IEEE International Symposium on for Sources with Memory Proceedings of the

Information Theory p

Tanenbaum Andrew S Computer Networks Prentice Hall International third

edn

Teuhola Jukka Raita Timo Arithmetic Co ding into FixedLength Co de

words IEEE Transactions on Information Theoryvol pp Jan

Tunstall B P Synthesis of Noiseless Compression Codes PhD thesis Georgia

Institute of Technology

BIBLIOGRAPHY

Vella Stephen J Implementation and Analysis of BCH Decoders for Reliable

Data Transmission BEngHons nal year pro ject University of Malta Depart

mentofCommunications and Computer Engineering Faculty of Engineering

Wallace Gregory K The JPEG Still Picture Compression Standard Com

munications of the ACMvol pp Apr