<<

Universidade Estadual de Campinas

Instituto de Matemática, Estatística e Computação Científica

Jerry Anderson Pinheiro

FORMA CANÔNICA PARA CÓDIGOS POSET E ESQUEMAS DE CODIFICAÇÃO-DECODIFICAÇÃO PARA PERDA ESPERADA

CANONICAL FORM FOR POSET CODES AND CODING-DECODING SCHEMES FOR EXPECTED LOSS

Campinas 2016 Jerry Anderson Pinheiro

Canonical Form for Poset Codes and Coding-Decoding Schemes for Expected Loss

Forma Canônica para Códigos Poset e Esquemas de Codificação-Decodificação para Perda Esperada

Tese apresentada ao Instituto de Matemática, Estatística e Computação Científica da Universidade Estadual de Campinas como parte dos requisitos exigidos para a obtenção do título de Doutor em Matemática.

Thesis presented to the Institute of Mathe- matics, Statistics and Scientific Computing of the University of Campinas in partial ful- fillment of the requirements for the degree of Doctor in Mathematics.

Orientador: Marcelo Firer

Este exemplar corresponde à versão final da tese defendida pelo aluno Jerry Anderson Pinheiro, e orien- tada pelo Prof. Dr. Marcelo Firer.

Campinas 2016 Agência(s) de fomento e nº(s) de processo(s): CNPq, 141586/2014-1; CAPES

Ficha catalográfica Universidade Estadual de Campinas Biblioteca do Instituto de Matemática, Estatística e Computação Científica Ana Regina Machado - CRB 8/5467

Pinheiro, Jerry Anderson, 1985- P655c PinCanonical form for poset codes and coding-decoding schemes for expected loss / Jerry Anderson Pinheiro. – Campinas, SP : [s.n.], 2016.

PinOrientador: Marcelo Firer. PinTese (doutorado) – Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica.

Pin1. Métricas sobre ordens parciais. 2. Códigos corretores de erros (Teoria da informação). I. Firer, Marcelo,1961-. II. Universidade Estadual de Campinas. Instituto de Matemática, Estatística e Computação Científica. III. Título.

Informações para Biblioteca Digital

Título em outro idioma: Forma canônica para códigos poset e esquemas de codificação- decodificação para perda esperada Palavras-chave em inglês: Poset metrics Correcting codes (Information theory) Área de concentração: Matemática Titulação: Doutor em Matemática Banca examinadora: Marcelo Firer [Orientador] Reginaldo Palazzo Junior Marcelo Muniz Silva Alves Marcelo da Silva Pinho Emerson Luiz do Monte Carmelo Data de defesa: 25-04-2016 Programa de Pós-Graduação: Matemática

Powered by TCPDF (www.tcpdf.org) Tese de Doutorado defendida em 25 de abril de 2016 e aprovada pela Banca Examinadora composta pelos Profs. Drs.

Prof. Dr. MARCELO FIRER

Prof. Dr. REGINALDO PALAZZO JUNIOR

Prof. Dr. MARCELO MUNIZ SILVA ALVES

Prof. Dr. MARCELO DA SILVA PINHO

Prof. Dr. EMERSON LUIZ DO MONTE CARMELO

A Ata da defesa com as respectivas assinaturas dos membros encontra-se no processo de vida acadêmica do aluno. Agradecimentos

Durante esses pouco mais de 4 anos de trabalho, inúmeras foram as pessoas e instituições que de alguma forma me ajudaram nessa caminhada. Para algumas, uma página de agradecimento seria pouco, no entanto, corri o risco, fiz uma síntese e omiti algumas, porém não as esqueci. Agradeço aos meus pais, Zilton e Adiles, pelo incondicional apoio. Apesar da distância que alimenta a saudade, vocês são os principais culpados desta conquista. Agradeço aos professores Marcelo Firer e Marcus Greferath; o primeiro, não apenas pela orientação do doutorado na Unicamp mas também pela confiança depositada e pelos ensinamentos; e o segundo, pela orientação durante o período de um ano que passei na University College Dublin. Para que um projeto seja iniciado, sempre há quem dê uma motivação e semeie uma ideia, um dos principais responsáveis para que eu me aventurasse pela pós graduação na Unicamp, e a quem deixo aqui meus agradecimentos, foi o professor Luciano Panek. Uma pessoa não consegue passar tanto tempo em um lugar sem cultivar ami- gos, cultivei muitos, alguns listarei aqui, porém muito mais ficarão na memória. Agradeço aos companheiros de pós graduação Cleber, Felix, João (Jhon); aos companheiros de lab- oratório, Campello, Ana, Bruno, Elen, Akemi, Julianna (Ju), Cintya, Giselle, Eleonésio, Alessandro, Adriana e Cláudio; aos meus “irmãos” de trabalho “filhos” do mesmo orien- tador: Marcos (Marquinhos), Roberto, Christiane (Chris), Rafael e Luciano, este último, parceiro nas “pint” de Guinness nos pubs de Dublin; e aos amigos que cultivei na Irlanda durante meu doutorado sanduíche, Jens, Oliver, Cornelia, Ana e Carolina (Carol), as duas últimas, parceiras em nossa “trip” pela Irlanda. Agradeço ao CNPq pelo suporte financeiro. Resumo

No contexto de códigos corretores de erros, métricas são utilizadas para definir decodificadores de máxima proximidade, uma alternativa aos decodificadores de máxima verossimilhança. A família de métricas poset tem sido extensivamente estudada no con- texto de teoria de códigos. Considerando a estrutura do grupo de isometrias lineares, é obtida uma forma canônica para matrizes geradoras de códigos lineares. Esta forma canônica permite obter expressões e limitantes analíticos para alguns invariantes clássicos da teoria: raio de empacotamento e complexidade de síndrome. Ainda, substituindo a probabilidade de erro pela perda esperada definida pelo desvio médio quadrático (entre a informação original e a informação decodificada), definimos uma proposta de codificação com ordem lexicográfica que, em algumas situações é ótima e em outras, as simulações feitas sugerem um desempenho ao menos subótimo. Finalmente, relacionamos a medida de perda esperada com proteção desigual de erros, fornecendo uma construção de códigos com dois níveis de proteção desigual de erros e com perda esperada menor que a obtida pelo produto de dois códigos ótimos, que separam as informações que são protegidas de modo diferenciado.

Palavras-chave: Métricas Sobre Ordens Parciais, Códigos Corretores de Er- ros (Teoria da Informação) Abstract

In the context of error-correcting codes, metrics are used to define minimum distance decoders, an alternative to maximum likelihood decoders. The family of poset metrics has been extensively studied in the context of coding theory. Considering the structure of the group of linear isometries, we obtain a canonical form for generator matrices of linear codes. The canonical form allows to obtain analytics expressions and bounds for classical invariants of the theory: packing radius and syndrome complexity. By substituting the error probability by the expected loss defined by the mean square deviation (between the original information and the decoded information), we propose an encoder scheme which, in some situations is optimal, and in others the simulations suggest a performance at least sub-optimal. Finally, we relate the expected loss measure with unequal error protection, providing a construction of codes with two levels of unequal error protection and expected loss smaller than the one obtained by the product of two optimal codes, which divide the information that is protected differently.

Keywords: Poset Metrics, Error Correcting Codes (Information Theory). Contents

Introduction 10

1 Metrics in Coding Theory 14 1.1 Decoders over Discrete Channels ...... 15 1.2 Decoding Schemes ...... 21 1.2.1 Syndrome Decoding ...... 28 1.3 Matching Metrics and Channels ...... 31

2 Metrics Induced by Partially Ordered Sets 34 2.1 Linear Codes ...... 34 2.1.1 Code Equivalence ...... 38 2.2 Partially Ordered Sets ...... 40 2.3 Poset Metrics ...... 44 2.3.1 Group of Linear Isometries for Poset Metrics ...... 45 2.3.2 Packing Radius of Poset Codes ...... 47

3 Canonical Form 52 3.1 Canonical Form for Hierarchical Metrics ...... 53 3.2 Partitions and Decompositions ...... 56 3.2.1 Hierarchical Bounds ...... 66 3.2.2 Packing Radius Bounds ...... 70 3.2.3 Construction of Maximal P-Decompositions ...... 71 3.2.4 Complexity of Syndrome Decoding Algorithm ...... 79

4 Expected Loss 84 4.1 Expected Loss ...... 85 4.2 Bayes Encoders ...... 90 4.3 A particular Case ...... 93 4.4 Expected Loss and Unequal Error Protection ...... 98

Future Perspectives 104 Extended Poset Metrics ...... 104 Better Hierarchical Bounds ...... 105

Bibliography 107

Index 111 10

Introduction

Metrics are mathematical structures of interest in coding theory. Several works are devoted to the study of metrics in the context of coding theory, and the best known and investigated metric in this context is the Hamming metric. It was suggested by R. W. Hamming in [25] when describing a geometric model of a code. Later, in ([28],1958) and ([44],1957), the Lee metric was defined, and became an interesting alternative when non-binary alphabets are used. Up to our knowledge, the first work considering metrics in a general approach is a short communication of S. W. Golomb [22] in 1969. In that work, it was described a family of additive metrics (metrics defined over an alphabet and extended additively to a set of words with a fixed length) which are still being investigated nowadays, as we can see in [41]. Recently, the interest on different and larger families of metrics in coding theory has been increased, as can be seen, for example, in [6], [2], [20] and [10]. This is partially motivated by the fact that metrics provide a decoding scheme using minimum distance, which in some cases (when metrics and channels are matched), is an alternative to Maximum a Posteriori (MAP) decoders. Also, minimum distance decoders may be used to add manageability to the decoding process, what can be achieved by the use of a Syndrome decoding algorithm, which is the most general and efficient decoding algorithm presented in this dissertation. One of those large families of metrics (with interest in coding theory) is the family of poset metrics, they were introduced by Brualdi et al in ([6],1991) as a gen- eralization of metrics obtained by Niederreiter in [35] and [36]. A poset metric on an 푛-dimensional vector space is determined by the choice of a partial order on the set {1, 2, . . . , 푛}. To describe the classical parameters of coding theory for such metrics is, in general, a difficult problem. For example, in [12], it was proved that to determining the packing radius of an one-dimensional code is an NP-hard problem. However, as we can see in [30], the family of hierarchical poset metrics, which is determined by the sub-family 11 of hierarchical posets, is a natural generalization of the Hamming metric (which belongs to this sub-family) and, classical coding invariants for those metrics are “easy” (as easy as in the Hamming case) to obtain by using the canonical-systematic form for linear codes, determined in [15]. Canonical forms are obtained by using the group of linear isometries and de- termine a standard, and relatively clean representation of codes, see [15] and [1]. Besides the hierarchical case, the only known attempt to generalize it was made in [1], where a standard form for a particular case (Niederreiter-Rosenbloom-Tsfasman (NRT), or orders consisting of multiple disjoint chains of the same order) is presented. In that work, one can see that the standard form is not unique (in any possible sense). In a matter of fact, as we will see later, unicity of such a decomposition is a characteristic of hierarchical posets. Here, considering general posets, we construct a standard form, decomposing a code as a direct sum of smaller codes, and this form is canonical with respect to the length and dimension of the smaller codes. The choice of a particular representation for codes (the canonical form), is motivated by the fact that it can be used to easily determine some code parameters. For the general case, there are no closed and general expressions for coding invariants, but using the canonical decomposition and comparing to hierarchical posets, we obtain bounds for two important invariants: the packing radius of a code and the complexity of syndrome decoding. Assuming that the minimum (poset) distance decoding is a relevant decoding criterion, despite the fact we do not explore it, we are actually assuming some underlying situation (possibly given by a channel model). If the specific situation of interest in the coding-decoding process was not made explicit when studying the canonical form of poset metrics, the model of the channel and a model for evaluating the errors is the core of the last part of this work, where we propose some alternatives to unequal error protection. In the classical coding theory, decoders are constructed in order to minimize both the error and refusal probabilities. In many communications scenarios, it is more efficient to better protect only a crucial part of the information. In order toachieve this goal, in [32], it was proposed the framework of unequal error protection (UEP). To evaluate the performance of a coding system with an unequal error protection, Masnik and Wolf introduced a measure called Average Error Cost (AEC). In this context, it is 12 assigned, for each information bit, a value corresponding the protection level of the bit (information bits with equal assigned value have the same importance). Then, considering these levels of protection, a value is assigned to each bit and this determines a cost for an error in a specific bit. The values on each bit position define a way to measure thecost of an error: the sum of the errors in each bit, each of those weighted by the value of the bit position. The AEC is then defined as the average of this quantity, the average being taken over all transmitted and received codeword. In [17], it was presented a broader framework for AEC, considering the ex- pected loss function (ELF) of a coding-decoding scheme. Here, to each pair of codewords it is assigned a value, representing the value of the error occurred when one information is exchanged by the other. In this sense, the AEC may be seen as a particular case of measure, which occurs in the instance of a ELF that is invariant by translations. As noted by Masnik and Wolf in [32], the main difference among the classical coding theory and the one with unequal error protection, is that the encoder is a relevant part of systems with UEP, while in the classical theory, for a given code, the error prob- ability depends only on the code, not on the encoder. In this context, we explore some possibilities for encoding and decoding separately. First of all, we propose a lexicographic encoder that in some simple situations is proved to be optimal, with some experimental evidences of very good performance in more general situations. We also explore a possi- bility of choice of a code and encoding process that performs a two-levels UEP, showing also some experimental measurements for its performance. Some relevant conjectures concerning those proposals are left open. This work is organized as follows: In Chapter 1 we describe the most common decoding schemes: maximum likelihood, maximum a posteriori probability and minimum distance decoders. We start with a very general definition of a model for a decoder as a stochastic map. Withthe goal of minimizing both the error and refusal probabilities simultaneously, we prove that we can restrict the attention to deterministic decoders, which is one of the most common definition of decoders found in the literature. In addition to pointing out the relevance of metrics in coding theory, this chapter also makes a connection between the concept of error probability and expected loss, which is introduced in the last chapter. In Chapter 2, the basics of coding theory are introduced: linear codes, code 13 equivalence, packing radius and group of linear isometries. We also introduce the basic concepts, definitions and properties concerning posets and poset metrics. The original contributions are concentrated in the last two chapters. Chapter 3 is devoted to the canonical form for a generator of poset codes. The decomposition of a code according to the poset metric and its canonical form are defined and constructed. Using the maximal decomposition, we obtain bounds forthe packing radius and the complexity of syndrome decoding. Chapter 4 starts exploring the framework of expected loss in the same gener- ality used in the first chapter to introduce the error and refusal probabilities. Weshow that, considering the expected loss, it is possible to exchange a probabilistic decoder by a deterministic one and, on the encoding side, the problem of minimizing the ELF may be translated into a problem of minimizing the trace of some matrices. The ELF of an average encoder is determined as the ELF concerning an equal valued system of informa- tion (the most usual instance in coding theory) and comparison to this average encoder can be used as a performance measure of an encoding scheme. The last two sections are devoted to particular cases, one regarding encoders and the other regarding decoders. In these sections, some conjectures are presented and a practical example is described. 14

Chapter 1

Metrics in Coding Theory

This chapter is a brief introduction on decoders and the role of metrics in coding theory. As a complementary reading we cite the recent survey of Gabidulin [20]. Even though we introduce the concepts of information theory in a slightly different way from Gabidulin’s survey, the survey can also be used as a supplementary reading. The stochastic map notation will be used in order to define channels and decoders. The stochastic maps approach simplifies the notations of channels and allows us to justify the usual definition for decoder. Thus, before we discuss information and coding theory, we shall define the two basic mathematical concepts that underlie this work:

Definition 1. (Stochastic Map) Let 풫풴 be the set of all probability distributions over a finite set 풴.A stochastic map 풫 : 풳 → 풫풴 is a map from a finite set 풳 to a probability distribution over 풴.

Given 푥 ∈ 풳 and 푦 ∈ 풴, the expression 푦 ∼ 풫(푥) means that the element 푦 was sampled from the probability distribution 풫(푥). We will write 풫(푦|푥) = 푃 푟(푦 = 푦′|푦′ ∼ 풫(푥)) to express the conditional probability for the occurrence of the event 푦 given that 푥 has occurred.

Definition 2. (Metric)A metric 푑 over 풳 is a function 푑 : 풳 × 풳 → R satisfying the following conditions:

(a) 푑(푥, 푦) ≥ 0 and 푑(푥, 푦) = 0 if, and only if, 푥 = 푦;

(b) 푑(푥, 푦) = 푑(푦, 푥);

(c) 푑(푥, 푦) ≤ 푑(푥, 푧) + 푑(푧, 푦) for all 푥, 푦, 푧 ∈ 풳 . 15

1.1 Decoders over Discrete Channels

The first geometric model of a code was suggested by R. W. Hamming in([25], 1950). This model gave rise to the well-known Hamming metric, the most investigated metric in coding theory. Just after this Hamming contribution, motivated by the cyclic structure of non-binary alphabets, the Lee model was introduced in ([44], 1957) and ([28], 1958). The Hamming metric is important for two reasons: it matches with one of the most studied channels in information theory, the Binary Symmetric Channel (BSC); and it is simple enough to allow the design of “good” decoding algorithms for specific types of codes. New geometric models of codes have been studied in coding theory and, consequently, new families of metrics fitting these models have been proposed, see [40], [6], [2] and [10]. In order to establish the precise relation between metrics and coding theory, we will first give a brief description of a communication system. The Shannon model of a point-to-point communication system, as shown in figure 1.1, breaks the process of communication down into a handful of components. Itis a minimalist abstraction of the reality; actually, in the “real world”, most of the communi- cation systems are much more complex. Each component of the model has its importance in the transmission process which may vary according to the system application.

Channel Message Source Encoder Decoder Receiver

c = c1 cn y = c + e x = x1 xk ··· x message··· codeword received word estimate of message

e = e1 en b error from··· noise Figure 1.1: Communication System

Basically, the functioning of this communication system is as follows: The Message Source generates messages with length 푘 in order to send them to the Receiver. This generator is modeled by a random variable and in most of the cases it is assumed to be uniformly distributed. The Encoder adds redundancy in each message following math- ematical rules, increasing the block of data from length 푘 to length 푛. This redundancy will provide structure in the ambient space in order to allow the Decoder to detect and correct some errors that eventually occur when a noisy channel is used. Each component described here will be formally defined since they are necessary to the comprehension of 16 the main objects of this work: the decoders.

Definition 1.1.1. Let 풳 and 풴 be finite sets. A discrete channel is a stochastic map

푊 : 풳 → 풫풴 where 풳 is called the input alphabet and 풴 the output alphabet of the channel.

We would like to stress that for a given channel 푊 , the expression 푊 (푦|푥) denotes the probability to receive 푦 given that 푥 was sent. The transition probabilities (or conditional probabilities) of a channel are in general obtained from experimental data. If no information is known, the worst-case scenario, or an approximation to it, is assumed. Given an alphabet 풳 , denote by 풳 푛 the set of all words with length 푛 over this alphabet. A block channel with input alphabet 풳 and output alphabet 풴 is a discrete 푛 channel 푊푛 : 풳 → 풫풴푛 . Note that block channels deal with words with a fixed length while the channels defined in 1.1.1 deal with alphabets. Given a block channel 푊푛, if 푛 푛 there is a channel 푊 such that for every 푦 = (푦1, . . . , 푦푛) ∈ 풴 and 푥 = (푥1, . . . , 푥푛) ∈ 풳 with 푦푖 ∈ 풴 and 푥푖 ∈ 풳 we have that

푛 ∏︁ 푊푛(푦|푥) = 푊 (푦푖|푥푖), 푖=1

푛 the discrete block channel 푊푛 will be denoted by 푊 and called memoryless. We remark that 푊 푛 is obtained by extending the channel 푊 to arrays. Due to this, it is often called the 푛-th extension of 푊 . Only discrete channels are considered in this work, therefore 푛 the word discrete will be omitted. For simplicity, if 푥 = (푥1, . . . , 푥푛) ∈ 풳 , sometimes the parenthesis or the commas of 푥 will be suppressed. Also, if no confusion may arise, the index 푛 is going to be omitted in the block channel notation.

Example 1.1.2. (Binary Symmetric Channel) A Binary Symmetric Channel 푊 : F2 →

풫F2 is a channel with input and output alphabets F2 (finite field with 2 elements) and conditional probabilities 푊 (1|1) = 1 − 푝 = 푊 (0|0) and 푊 (1|0) = 푝 = 푊 (0|1) where 0 ≤ 푝 ≤ 1/2. This channel and its 푛-th extension are the most studied channels in information theory. They are called symmetric because 푊 (푥|푦) = 푊 (푦|푥) for all 푥, 푦 ∈ F2. It is usual to represent this channel by the diagram in Figure 1.2.

Example 1.1.3. (Binary Erasure Channel) A Binary Erasure Channel 푊 : F2 → 풫F2∪{?} is a channel with conditional probabilities 푊 (1|1) = 1 − 푝 = 푊 (0|0) and 푊 (? |0) = 푝 = 17

1 p 1 − 1

p

p

0 0 1 p − Figure 1.2: Binary Symmetric Channel

푊 (? |1) where 0 ≤ 푝 ≤ 1/2. The symbol ? means that the bit sent was erased. This channel is frequently used in information theory because it is one of the simplest channels to analyze: whenever an error occurs, it tells you about the existence and position of the error in the array. It is represented by the diagram 1.3.

1 p 1 − 1

p

?

p

0 0 1 p − Figure 1.3: Binary Erasure Channel

Example 1.1.4. (푞-ary Symmetric Channels) A memoryless symmetric channel with input and output alphabets 풳 with 푞 = |풳 |, and crossover probability 푝, where 0 ≤ 푝 ≤ 1/2 is a channel defined by the following conditional probabilities: 푊 (푦|푦) = 1 − 푝 and 푊 (푦|푥) = 푝/(푞 − 1) if 푥 ≠ 푦. A particular case of these channels is the binary symmetric channel defined in Example 1.1.2.

Definition 1.1.5. Consider the set 풳 푛 of all words with length 푛 over the alphabet 풳 . A code 풞 is any subset of 풳 푛. The elements of the code 풞 are called codewords.

An (푛, 푀, 푞)-code is a code 풞 ⊂ 풳 푛 where |풞|= 푀 and |풳 |= 푞. From now on, for the sake of simplicity, we will assume that the input alphabet 풳 is contained in the output alphabet 풴. This restriction ensures that 풞 ⊂ 풳 푛 ⊂ 풴푛 for every code 풞 over 풳 푛. It facilitates the definition of the next structures (decoders and encoders). An information set is any set with cardinality 푞푘 where 푞 and 푘 are positive integers. We remark that the set of all messages can be identified with the set of all 18 words with length 푘 over a finite alphabet 풳 with 푞 elements, which is denoted by 풳 푘. 푘 Therefore, the notation (푛, 푀, 푞) is going to be exchanged by (푛, 푘)푞 since 푀 = 푞 .

Definition 1.1.6. Given an information set 풳 푘 and an integer 푛 ≥ 푘, an encoder is an 푘 푛 injective map 푓 : 풳 → 풳 , its image is an (푛, 푘)푞-code 풞.

푛 Definition 1.1.7. Let 풞 ⊂ 풳 be an (푛, 푘)푞-code. A decoder of 풞 is a decision criteria for 푛 푛 each 푦 ∈ 풴 modeled by a stochastic map 퐷 : 풴 → 풫풞∪{∞} such that 퐷(∞|푦) ∈ {0, 1} for every 푦 ∈ 풴푛 and 퐷(푐|푐) = 1 for every 푐 ∈ 풞. The set of all decoders of 풞 will be denoted by 풟풴 (풞).

푛 We remark that if 퐷 ∈ 풟풴 (풞), each 푦 ∈ 풴 determines a (probabilistic) decision criteria given by the probability distribution 퐷(푦). Given 푥 ∈ 풞 ∪ {∞} and 푦 ∈ 풴푛, the value 퐷(푥|푦) denotes the probability to the decoder outputs 푥 given that the word 푦 was received. If a codeword 푐 ∈ 풞 is received, the assumption 퐷(푐|푐) = 1 ensures that the decoder will assume the codeword 푐 was sent. The symbol ∞ denotes that an error has occurred and the decoder could not solve the decision problem, refusing the received word. This means that the information must be sent again (or forgotten). Since 퐷(∞|푦) ∈ {0, 1}, to decide when the received word 푦 ∈ 풴푛 is going to be refused or not it is a deterministic criteria. Given a decoder 퐷 and an encoder 푓 for the code 풞, a full decoder is a stochastic ′ 푛 ′ ′ map 퐷 : 풴 → 풫풳 푘∪{∞} defined by 퐷 (푐|푦) = 퐷(푓(푐)|푦) and 퐷 (∞|푦) = 퐷(∞|푦). Note that 푓 −1 : 풞 → 풳 푘 is a bijection, thus the stochastic map 퐷′ is well defined. The difference between full decoders and decoders is that a decoder outputs an error ∞ or an element 푐 ∈ 풞, and a full decoder outputs an error or an information 푓 −1(푐) ∈ 풳 푘.

푛 Definition 1.1.8. A (풞, 푓, 퐷) encoding-decoding scheme for the channel 푊 : 풳 → 풫풴푛 consists of

1 - An information set 풳 푘 with cardinality 푞푘 where 푞 = |풳 | and 푘 ≤ 푛;

푘 푛 푘 2 - An encoder 푓 : 풳 → 풳 where 푓(풳 ) = 풞 is an (푛, 푘)푞-code;

푛 3 - A decoder (stochastic map) 퐷 : 풴 → 풫풞∪{∞}.

The main purpose of coding theory is to ensure reliable transmission of infor- mation through a noisy channel, reliability being determined by an appropriate measure. 19

Among the most important measures, there are the error probability and the refusal probability of a code. They express the amount of expected errors and refusals.

Given a decoder 퐷 ∈ 풟풴 (풞) and a channel 푊 , the refusal probability for a codeword 푐 ∈ 풞 is the probability that the decoder refuses a vector given that 푐 was sent, i.e., 퐷 ∑︁ 푃푟푒푓 (푐) = 푊 (푦|푐)퐷(∞|푦). 푦∈풴푛

We remark that the probabilities 푊 (푦|푐) and 퐷(푦|푐) are determined by the channel and the decoder respectively. The refusal probability of the code 풞 is the mean

퐷 ∑︁ 퐷 푃푟푒푓 (풞) = 푃푟푒푓 (푐)푃 (푐) 푐∈풞 where 푃 (푐) is the probability of the codeword 푐 ∈ 풞 to be sent through the channel. The error probability can be defined similarly to the refusal probability. The error probability of a codeword 푐 ∈ 풞 is defined by

퐷 ∑︁ 푃푒 (푐) := 푊 (푦|푐)(1 − 퐷(∞|푦) − 퐷(푐|푦)) 푦∈풴푛 where 1−퐷(∞|푦)−퐷(푐|푦) is the probability that the decoder outputs a codeword different to the sent one 푐. The error probability of the code is defined by

퐷 ∑︁ 퐷 푃푒 (풞) := 푃푒 (푐)푃 (푐). (1.1) 푐∈풞

It is clear that the error and refusal probabilities do not depend on the encoder. This is not true when considering error value functions and unequal error protection, as we can see in [16]. Assuming those probabilities as measures of reliability, we define optimal decoder:

Definition 1.1.9. For a given code 풞, an optimal decoder 퐷* is a decoder satisfying

퐷* 퐷* (︁ 퐷 퐷 )︁ 푃푟푒푓 (풞) + 푃푒 (풞) = min 푃푟푒푓 (풞) + 푃푒 (풞) . 퐷∈풟풴 (풞)

Proposition 1.1.10. Given a decoder 퐷 ∈ 풟풴 (풞), there exists a decoder 퐷̃︁ ∈ 풟풴 (풞) such that 퐷̃︀ 퐷 퐷 푃푒 (풞) ≤ 푃푟푒푓 (풞) + 푃푒 (풞) 20

퐷̃︀ and 푃푟푒푓 (풞) = 0.

푛 Proof. Given a decoder 퐷 ∈ 풟풴 (풞), suppose there exists 푦0 ∈ 풴 such that 퐷(∞|푦0) = 1. * * * Define a new decoder 퐷 ∈ 풟풴 (풞) satisfying 퐷 (푦) = 퐷(푦) for all 푦 ≠ 푦0 but 퐷 (푦0) is * a new distribution which does not refuse 푦0, so 퐷 (∞|푦0) = 0. Our goal here is to show that 퐷* 퐷* 퐷 퐷 푃푟푒푓 (풞) + 푃푒 (풞) ≤ 푃푟푒푓 (풞) + 푃푒 (풞). (1.2)

* * Initially, note that since 퐷 (∞|푦0) = 0 and 퐷 (∞|푦) = 퐷(∞|푦) for all 푦 ≠ 푦0, hence

퐷* ∑︁ ∑︁ 푃푟푒푓 (풞) = 푊 (푦|푐)퐷(∞|푦)푃 (푐). 푛 푐∈풞 푦∈풴 ∖{푦0}

Therefore, 퐷 퐷* ∑︁ 푃푟푒푓 (풞) = 푃푟푒푓 (풞) + 푊 (푦0|푐)푃 (푐) (1.3) 푐∈풞 because 퐷(∞|푦0) = 1. On the other hand, since 퐷(푐|푦0) = 0 for all 푐 ∈ 풞, by definition of 퐷*, 퐷 ∑︁ ∑︁ * * 푃푒 (풞) = 푊 (푦|푐)(1 − 퐷 (∞|푦) − 퐷 (푐|푦))푃 (푐), 푛 푐∈풞 푦∈풴 ∖{푦0} thus 퐷* 퐷 ∑︁ * 푃푒 (풞) = 푃푒 (풞) + 푊 (푦0|푐)(1 − 퐷 (푐|푦0))푃 (푐). (1.4) 푐∈풞 It is straightforward that

∑︁ * ∑︁ 푊 (푦0|푐)(1 − 퐷 (푐|푦0))푃 (푐) ≤ 푊 (푦0|푐)푃 (푐), 푐∈풞 푐∈풞 therefore,

퐷* 퐷 ∑︁ * 퐷* 퐷 ∑︁ 푃푟푒푓 (풞) + 푃푒 (풞) + 푃 (푦0|푐)(1 − 퐷 (푐|푦0))푃 (푐) ≤ 푃푟푒푓 (풞) + 푃푒 (풞) + 푊 (푦0|푐)푃 (푐). 푐∈풞 푐∈풞

Together with Identities (1.3) and (1.4), we obtain the Inequality (1.2). The decoder 퐷̃︁ is constructed by following this procedure until we run out of refused elements.

* Corollary 1.1.11. Given a code 풞, there exists an optimal decoder 퐷 ∈ 풟풴 (풞) satisfying 퐷* 푃푟푒푓 (풞) = 0.

Since we are not concerned with specific applications, and in this level of 21 generality we are assuming only the error and refusal measures, Corollary 1.1.11 ensures that we do not lose generality by assuming the refusal probability to be zero. Therefore, we will restrict ourselves to the goal in minimizing the error probability. Thus, from now 푛 on, the decoder model will be considered as stochastic maps 퐷 : 풴 → 풫풞 (instead of 푛 퐷 : 풴 → 풫풞∪{∞}).

1.2 Decoding Schemes

As seen in the previous section, we can consider only complete decoders: de- coders with no refusal option. In this situation, optimal decoders are the ones minimizing the error probability. We can basically divide the problem of searching good decoders using two criteria, the usefulness and the manageability of the decoder. Due to the generality of this work, we will not deal with the manageability criteria of the decoders when dealing with metrics decoders, the manageability of metric decoders will be justified later by the possibility to use syndrome decoding, a general procedure for decoding linear codes. Therefore, from our point of view, good decoders are not necessarily practical, since they can be hard to deal with. We will now define a series of abstract decoders and add some mathematical structures on 풳 푛, structures that are, in most of the cases, sufficient conditions to add manageability to the decoder. Given a channel 푊 , by the Bayes’ rule, 푊 (푦|푐)푃 (푐) = 푊 (푐|푦)푃 (푦), where 푃 (푐) denotes the probability of the codeword 푐 ∈ 풞 to be sent and 푃 (푦) is the probability that 푛 푦 ∈ 풴 is received. We remark that we are assuming complete decoders, so if 퐷 ∈ 풟풴 (풞), the error probability can be alternatively written in the following way:

퐷 ∑︁ ∑︁ 푃푒 (풞) = 푊 (푦|푐) (1 − 퐷(푐|푦)) 푃 (푐) 푐∈풞 푦∈풴푛 ∑︁ ∑︁ = 푊 (푐|푦) (1 − 퐷(푐|푦)) 푃 (푦) 푐∈풞 푦∈풴푛 ∑︁ ∑︁ = 1 − 푊 (푐|푦)푃 (푦)퐷(푐|푦). (1.5) 푐∈풞 푦∈풴푛

Definition 1.2.1. Given a channel 푊 , a Maximum a Posteriori Probability (MAP) de- 푛 coder is a decoder 퐷 : 풴 → 풫풞 such that 퐷(푦) is a conditional distribution over 풞 satisfying 푊 (푐′ ∼ 퐷(푦)|푦) = max{푊 (푐|푦): 푐 ∈ 풞} 22 for every 푦 ∈ 풴푛.

As one should expect, decoders that minimize error probability are those that maximize the conditional probabilities 푊 (푐|푦), as explained below.

Lemma 1.2.2. Given a sequence of 푛 non-negative integers 푎1 ≥ · · · ≥ 푎푛 such that ∑︀ 푖 푎푖 = 1. Suppose 푎1 = ... = 푎푘 > 푎푘+1 for some 1 ≤ 푘 ≤ 푛 − 1. Consider the maximization problem 푛 푛 ∑︁ ′ ∑︁ 푎푖푏푖 = max 푎푖푏푖 {푏 ,...,푏 } 푖=1 1 푛 푖=1 ∑︀ where the maximum is over the sets {푏1, . . . , 푏푛} satisfying 푖 푏푖 = 1. Then every sequence ′ ′ ∑︀푘 ′ of non-negative integers 푏1, . . . , 푏푘 satisfying 푖=1 푏푖 = 1 is a solution for this problem.

Proof. Note that 푛 (︂ )︂ ∑︁ 푎2 푎푛 푎푖푏푖 = 푎1 푏1 + 푏2 + ··· + 푏푛 . 푖=1 푎1 푎1

푎2 푎푛 Since 푎1 ≥ 푎푖 for every 푖, it follows that 푏1 + 푏2 + ··· + 푏푛 ≤ 1, therefore, 푎1 푎1

(︂ 푎2 푎푛 )︂ 푎1 푏1 + 푏2 + ··· + 푏푛 ≤ 푎1. 푎1 푎1

Then 푏1 = 1 and 푏푖 = 0 for all 푖 > 1 is a solution for the maximization problem. Because ∑︀푘 푎1 = ... = 푎푘, every sequence of non-negative integers 푏1, . . . , 푏푘 satisfying 푖=1 푏푖 = 1 is also a solution.

Theorem 1.2.3. If 퐷 is a MAP decoder, then 퐷 is optimal.

Proof. By Equation 1.5, for any code 풞,

(︃ )︃ 퐷 ∑︁ ∑︁ 푃푒 (풞) = 1 − 푊 (푐|푦)퐷(푐|푦) 푃 (푦). 푦∈풴푛 푐∈풞

Note that to minimize the error probability is equivalent to maximize the expression ∑︁ 푊 (푐|푦)퐷(푐|푦) 푐∈풞 for each choice of 푦 ∈ 풴. Note that if 푐 ∼ 퐷(푦) and 푐′ ∼ 퐷(푦), by the definition of a MAP decoder, 푊 (푐|푦) = 푊 (푐′|푦). Also, if 푐 cannot be sampled from 퐷(푦), it means that 퐷(푐|푦) = 0. Thus, the conditional probabilities 퐷(푐|푦) satisfy the conditions of 23

Lemma 1.2.2 and are solutions for that maximization problem. Therefore, MAP decoders minimizes the error probability.

1 푛 A decoder 퐷 ∈ 풟풴 (풞) is said to be deterministic if for every 푦 ∈ 풴 , there is an element 푐 ∈ 풞 such that 퐷(푐|푦) = 1. Therefore, a deterministic decoder is a surjective map 퐷 : 풴푛 → 풞. These decoders will be denoted by 푔. The next proposition ensures that we lose no generality if we consider only deterministic decoders. The proof follows directly from the definition of a MAP decoder and from Lemma 1.2.2.

Proposition 1.2.4. Given a channel 푊 , there is a deterministic decoder 푔 : 풴푛 → 풞 which is a MAP decoder.

Due to Proposition 1.2.4, from now on we will consider only deterministic decoders. We can reformulate the definition of MAP decoder as follows:

Definition 1.2.5. (MAP Decoders - Revisited)A Maximum a Posteriori Probability (MAP) decoder is a decoder 푔 : 풴푛 → 풞 satisfying the condition

푊 (푔(푦)|푦) = max{푊 (푐|푦): 푐 ∈ 풞}.

Consequently, the error probability of a code 풞 can be rewritten as

푔 ∑︁ ∑︁ 푃푒 (풞) = 푊 (푦|푐)푃 (푐). 푐∈풞 푦̸∈푔−1(푐)

Instead of defining a decoder according to a posteriori probabilities 푊 (푐|푦), we can define it by using a priori probabilities 푊 (푦|푐): the probability to receive 푦 if 푐 is sent.

Definition 1.2.6. A Maximum Likelihood (ML) decoder is a decoder 푔 : 풴푛 → 풞 satis- fying the condition 푊 (푦|푔(푦)) = max{푊 (푦|푐): 푐 ∈ 풞}.

The proof of the next well-known proposition follows straight from the Bayes’ rule.

Proposition 1.2.7. If the distribution of the code 푃 (푐) (the probability to send 푐) is uniform, then a decoder is an ML decoder if and only if it is a MAP decoder. 1The word “deterministic” is used since for those decoders, given a particular received array, the decoder will always produce the same output. 24

From now on we will assume that 푃 (푐) is uniformly distributed. It is clear that when dealing with decoders, we are playing with a sort of measure. One of the mathe- matical measures that can be used in order to decrease the computational complexity for the decoding algorithms are metrics. In order to use metrics, we will use the assumption that the input and output alphabets of a channel are the same, i.e., 풳 = 풴.

Definition 1.2.8. Given a metric 푑 over 풳 푛, a Minimum Distance (MD) decoder is a decoder 푔 : 풳 푛 → 풞 satisfying the condition

푑(푦, 푔(푦)) = min{푑(푦, 푐): 푐 ∈ 풞}.

If 푔 is a MD decoder according to 푑, we will say that 푔 is a 푑-MD decoder.

The Hamming metric was constructed in [25] when exploring the geometric representation (Hamming cube) of a code and it is the most studied metric in coding theory.

푛 푛 Example 1.2.9. (Hamming Distance) The function 푑퐻 : 풳 × 풳 → R+ defined by

푑퐻 (푥, 푦) = |{푖 : 푥푖 ≠ 푦푖}|

where 푥 = (푥1, . . . , 푥푛) and 푦 = (푦1, . . . , 푦푛) is called Hamming distance.

Proposition 1.2.10. For any memoryless symmetric channel with crossover probability 푝 ≤ 1/2, the MD decoder determined by the Hamming metric is also an ML decoder.

푛 Proof. Let 풳 be the alphabet of the channel. Given 푥 = (푥1, . . . , 푥푛) ∈ 풳 and 푐 =

(푐1, . . . , 푐푛) ∈ 풞, then

푊 (푦|푐) = 푊 (푦1|푐1) · ... · 푊 (푦푛|푐푛) (︃ 푝 )︃|{푖:푦푖̸=푐푖}| = (1 − 푝)|{푖:푦푖=푐푖}| 푞 − 1 (︃ 푝 )︃푑퐻 (푦,푐) = (1 − 푝)푛−푑퐻 (푦,푐) 푞 − 1 (︃ 푝 )︃푑퐻 (푦,푐) = (1 − 푝)푛 . (1 − 푝)(푞 − 1) 25

Since 푝/[(1 − 푝)(푞 − 1)] < 1,

′ ′ 푊 (푦|푐) ≥ 푊 (푦|푐 ) if, and only if 푑퐻 (푦, 푐) ≤ 푑퐻 (푦, 푐 ) for every 푐, 푐′ ∈ 풞.

We remark that we are considering 푃 (푐) to be uniformly distributed. Thus, from Propositions 1.2.7 and 1.2.10 we have the following result.

Theorem 1.2.11. In a 푞-ary symmetric channel with crossover probability 푝 ≤ 1/2, MAP decoders, ML decoders and MD decoders according to the Hamming metric are optimal decoders.

We now add some mathematical structure to 풳 and 풳 푛 in order to develop tools that can be helpful in handling the problems that may arise when dealing with large finite sets. The most common ones are the structures of finite fields and vectorspaces. 푟 푛 We consider the alphabet 풳 to be a finite field F푞 where 푞 = 푝 (푝 prime), so that, 풳 is 푛 an 푛-dimensional vector space, namely F푞 .

푛 푛 Definition 1.2.12. A metric 푑 : F푞 × F푞 → R is said to be invariant by translations if

푑(푥 + 푧, 푦 + 푧) = 푑(푥, 푦)

푛 for every 푥, 푦, 푧 ∈ F푞 .

Metrics can be defined by using norm and weight functions.

푛 Definition 1.2.13. A function 푤 : F푞 → R is a weight if it satisfies the following axioms:

• 푤(푥) ≥ 0, for every 푥;

• 푤(푥) = 0 if, and only if, 푥 = 0;

It is clear that if we define the function 푑 by 푑(푥, 푦) = 푤(푥 − 푦), then 푑 is a semimetric (it has all properties of a metric but the triangular inequality). Moreover, given a semmimetric 푑, the function 푤(푥) = 푑(푥, 0) is a weight. The family of weight functions that we are interested in are the ones preserving the support. A particular but important family (the poset metrics) may be obtained by 26 generalizing the Hamming metric. This particular family will be introduced in the next 푛 chapter and will be the main object of this work. The support of an element 푥 ∈ F푞 is the set

푠푢푝푝(푥) = {푖 : 푥푖 ≠ 0}.

푛 Definition 1.2.14. A weight function 푤 : F푞 → R is said to preserve the support if

푠푢푝푝(푥) ⊂ 푠푢푝푝(푦) ⇒ 푤(푥) ≤ 푤(푦).

Example 1.2.15. (Combinatorial Metrics, [21]) Let [푛] := {1, . . . , 푛}. Consider 푇 = 푠 {푇0 = ∅, 푇1, . . . , 푇푠} to be a family of subsets covering [푛], i.e., ∪푖=0푇푖 = [푛]. The 푇 - weight is defined by

• 푤푇 (푥) = 0 ⇐⇒ 푠푢푝푝(푥) = ∅;

• 푤푇 (푥) = 1 ⇐⇒ 푠푢푝푝(푥) ⊂ 푇푖 for some i;

• 푤푇 (푥) = 푘 ⇐⇒

푠푢푝푝(푥) ⊂ {푎 푢푛푖표푛 표푓 푒푥푎푐푡푙푦 푘 푠푢푏푠푒푡푠 푓푟표푚 푇 } but

푠푢푝푝(푥) ̸⊂ {푎 푢푛푖표푛 표푓 푘 − 1 표푟 푙푒푠푠 푠푢푏푠푒푡푠 푓푟표푚 푇 }.

The Combinatorial Metric 푑푇 is obtained by taking

푑푇 (푥, 푦) = 푤푇 (푥 − 푦).

It is well-known [20] that 푑푇 is a metric and that preserves support. Several particular cases of those metrics are well-studied in the context of coding theory, see [20] for more details.

Since 푑푇 is also semimetric, it follows that 푤푇 is a weight. Moreover, it is a norm in the following sense:

푛 Definition 1.2.16. A function 푁 : F푞 → R is a norm if it is a weight and satisfies

푛 푁(푥 + 푦) ≤ 푁(푥) + 푁(푦) for all 푥, 푦 ∈ F푞 . 27

Concerning metrics and semimetrics invariant by translations, we have the following:

(1) If 푑 is a metric invariant by translations, then the function 푁푑(푥) = 푑(푥, 0) is a

norm. Moreover, if 푁 is a norm, the function 푑푁 defined by 푑푁 (푥, 푦) = 푁(푥 − 푦) is a metric invariant by translations.

(2) If 푑 is a semimetric invariant by translations, then the function 푤푑(푥) = 푑(푥, 0) is a

weight. Moreover, if 푤 is a weight, the function 푑푤 defined by 푑푤(푥, 푦) = 푤(푥 − 푦) is a semimetric invariant by translations.

Example 1.2.17. Not every weight preserving support is a norm. Indeed, define the 2 weight 푤 over F2 by setting 푤(00) = 0, 푤(01) = 푤(10) = 1 and 푤(11) = 3. This is a weight that preserves the support but does not satisfies the third condition of norms (triangular inequality) because 3 = 푤(11) > 푤(10) + 푤(01) = 2. Then the function 푑(푥, 푦) := 푤(푥−푦) is a semimetric (satisfies all the properties of metrics but the triangular inequality).

The example below provides a norm which does not preserve support. This 푛 extension of the Lee metrics are induced by the ℓ푝 metric in Z , as we can see in [7].

Example 1.2.18. The Lee weight over Z푙 is defined by

푤퐿(푥) = min{푥 (mod 푙), −푥 (mod 푙)}.

푛 푛 The 푝 extension of this norm is the 푝-Lee norm over Z푙 where if 푥 = (푥1, . . . , 푥푛) ∈ Z푙 , then (︃ 푛 )︃1/푝 푝 ∑︁ 푝 푤퐿(푥) = 푤퐿(푥푖) . 푖=1 푝 Take 푙 > 3, then if 푠푢푝푝(푥) = 1 = 푠푢푝푝(푦) with 푥1 = 1 and 푥2 = 2, thus 푤퐿(푥) = 1 and 푝 푝 푝 푤퐿(푦) = 2. If 푝-Lee weights preserve support, then we should have 푤퐿(푥) = 푤퐿(푦) since 푠푢푝푝(푥) = 푠푢푝푝(푦).

It is clear that if we define the function 푑 by 푑(푥, 푦) = 푁(푥 − 푦), then 푑 is a metric. Norm and weight functions are also related by their induced metric and semimet- ric. Indeed, if the semimetric satisfies the triangular inequality, then the weight function is a norm. We will see that regarding matching metrics and channels, the triangular 28 inequality can be easily obtained by using semimetrics. The next two propositions are well-known results that we will not proof here.

Proposition 1.2.19. If a metric 푑 is invariant by translations, then 푑 is induced by a norm.

There is another family of weight functions with interest in coding theory, the invariant weights, which are weight functions satisfying 푤(푡푥) = 푤(푥) for every 푡 ∈ F푞 ∖{0} 푛 and 푥 ∈ F푞 . In [23], a characterization of invariant weights satisfying MacWilliams Extension property was given. The invariant weights and the weights preserving sup- port are related by the next proposition, which proof follows directly from the fact that 푠푢푝푝(푡푥) = 푠푢푝푝(푥) for all 푡 ≠ 0.

Proposition 1.2.20. A weight preserving support is an invariant weight.

Even though the metrics are induced by norms, we will use the weight notation since it is commonly used in coding theory. In the next section we will present the syndrome decoding algorithm, an algorithm that works only with invariant by translations metrics (or semimetrics).

1.2.1 Syndrome Decoding

We are interested in MD decoding according to a metric (or semimetric). As- suming the metric is invariant by translations, we can use the well-known Syndrome Decoding algorithm as an alternative to MD decoding. This is the most powerful and general decoder presented in this thesis, justifying the use of metrics in coding theory. Our main goal is to prove that in the invariant by translation case, syndrome decoding is a minimum distance decoder. 푛 푛 Let 풞 be a linear code over F푞 (풞 is a subspace of F푞 ). Then, 풞 is the kernel of some linear transformation, therefore there is an (푛 − 푘) × 푛 matrix 퐻, the parity check matrix, satisfying 푛 푇 풞 = {푥 ∈ F푞 : 퐻푥 = 0}.

푇 푘 The vector 퐻푥 ∈ F푞 is called the syndrome of 푥, therefore it will be denoted by 푆푦푛(푥). Two elements belong to the same coset of 풞 if, and only if, they have the same syndrome, i.e., 푥 + 풞 = 푦 + 풞 ⇐⇒ 푆푦푛(푥) = 푆푦푛(푦). 29

Syndrome Decoding Algorithm. Precomputation: For each coset 푥 + 풞, we choose a coset leader 푥 such that 푤(푥) = min{푤(푧): 푧 ∈ 푥 + 풞}.

푛 Input: 푦 ∈ F푞

1. Find the coset leader 푥 such that 푆푦푛(푥) = 푆푦푛(푦).

Output: 푦 − 푥.

Given that 푦 was received, all the possible errors have the same syndrome of 푦, hence it is obvious that the error to be found is an element of a particular coset, the one having the syndrome of 푦. In order to conclude that syndrome decoding is a minimum distance decoding, we need to proof that the coset leader with smaller weight is the appropriate choice. The next example shows that the hypothesis that 푑 is invariant by translations is essential.

2 2 Example 1.2.21. Let 푑 : F2 × F2 → R+ be a metric defined by the distance table below:

d(x,y) 00 01 10 11 00 0 2 2 3 01 2 0 2 4 10 2 2 0 1 11 3 4 1 0

2 It is clear that 푑 is indeed a metric over F2, furthermore, 푑 is not invariant by translation since 2 = 푑(01, 10), 3 = 푑(00, 11) and 푑(01 + 01, 10 + 01) = 푑(00, 11).

Let 풞 = {00, 01} be an 1-dimensional linear code and suppose we want to decode 푦 = 11. If the decoder used is a 푑-MD, then the closest codeword to 푦 is 00 since 3 = 푑(00, 11) < 푑(01, 11) = 4. Suppose now we will use syndrome decoding according to 푑. The parity check matrix of 풞 is given by 퐻 = [1 0]. Note that 푆푦푛(10) = 푆푦푛(11) = 1 and that 푆푦푛(00) = 푆푦푛(01) = 0. Since 푆푦푛(푦) = 1, the possible errors are 10 and 11. Note that 2 = 푑(10, 00) < 푑(11, 00) = 3. Due to this, syndrome decoding assumes 10 as being the error, so it outputs 푦 − 10 = 01, but as we saw, MD decoders output 00, therefore syndrome is not an MD decoder. 30

Theorem 1.2.22. If 푑 is invariant by translations, then syndrome decoding is an MD decoder.

Proof. Suppose 푦 is the vector to be decoded and that 푐′ was obtained by decoding 푦 using syndrome, then 푦 = 푒 + 푐′ with 푆푦푛(푒) = 푆푦푛(푦). This representation is not unique ′ since 푦 = 푒2 +(푐 −푐1) for every 푐1 ∈ 퐶 where 푒2 = 푒+푐1. Assume that 푒 is a coset leader. ′ ′ Because 푦 − 푐 = 푒 and 푦 − (푐 − 푐1) = 푒2,

′ ′ ′ ′ 푑(푦, 푐 ) = 푑(푦 − 푐 , 0) = 푑(푒, 0) ≤ 푑(푒2, 0) = 푑(푦 − (푐 − 푐1), 0) = 푑(푦, 푐 − 푐1), therefore, 푑(푦, 푐′) = min 푑(푦, 푐). 푐∈풞

Since the algorithm for syndrome decoding allows precomputations to choose the coset leader in advance, during the decoding process, we only need to find the right coset, so the search ambient is reduced from 푞푘 elements (the number of codewords) to 푞푛−푘 elements (the number of cosets). We remark that good codes are expected to have high rates, so 푛 − 푘 tends to be smaller than 푘. Therefore, syndrome decoding provide, in many cases, an improvement in the performance of an MD decoder. It is not true in general that every MD decoder is obtained by the syndrome decoding algorithm, but any metric (or semimetric) determines some MD decoder that admits a syndrome algorithm.

2 Example 1.2.23. Let 풞 be the unidimensional code in F2 generated by 11, so 풞 = {00, 11}. The parity check matrix of 풞 is 퐻 = [1 1]. The vectors 00 and 11 have 2 syndrome equal to 0 and the others have syndrome equal to 1. Let 푔 : F2 → 풞 be a

푑퐻 -MD decoder defined by 푔(00) = 00, 푔(11) = 11, 푔(10) = 11 and 푔(01) = 11. There are only two syndrome decoders, one is given by electing 10 as a coset leader and the other is defined by electing 01. These decoders are as follows:

′ ′ ′ ′ 푔1(00) = 00 푔1(11) = 11 and 푔1(10) = 00 푔1(01) = 11 and ′ ′ ′ ′ 푔2(00) = 00 푔2(11) = 11 푔2(10) = 11 and 푔2(01) = 00. 31

′ ′ Neither 푔1 nor 푔2 coincides with 푔.

1.3 Matching Metrics and Channels

There are many ways to define a matching between channels and metrics, as we can see in [20], but their purpose are all the same: to give characterizations of metrics which are as close as possible to (optimal) MAP decoders. As seen in Proposition 1.2.10, a decoder determined by the Hamming metric 푑퐻 and an ML decoder determined by a 푞-ary symmetric channel 푊 are matched in the sense that

푑퐻 (푥, 푦) ≤ 푑퐻 (푥, 푧) ⇐⇒ 푊 (푦|푥) ≥ 푊 (푧|푥).

Unfortunately, cases like this do not always occur, and even when it is possible to match channels and metrics, the metric constructed may be so complex that it is useless for practical purposes. One of the first papers relating metrics and channels is actually a course note given by Massey [33] apud [28]. Later, Séguin in [42], obtained necessary and sufficient conditions for a discrete memoryless channel to admit an additive metric (metrics deter- mined over the alphabet which are additively extended to vectors) matching to it. The relations between metrics and channels (including the matching problem) were set aside for many years, until renewed interest arose due to new applications, as we can see in [43], [40] and [20]. Since the 1990s, many different families of metrics started to be studied in the context of coding theory, despite the fact that their role, in connection to a channel (or a general communication scheme), is not properly understood. A particular family of such metrics will be explored later. The definition of matching we will adopt is the one given by Séguin in [42].

푛 푛 푛 푛 Definition 1.3.1. Given a discrete channel 푊 : F푞 → 풫F푞 and a metric 푑 : F푞 × F푞 → R, 푛 푛 we say that 푊 and 푑 are matched if, for every code 풞 ⊂ F푞 and every vector 푥 ∈ F푞 ,

arg max 푊 (푥|푦) = arg min 푑(푥, 푦), 푦∈풞 푦∈풞 or equivalently, 푑(푥, 푦) < 푑(푥, 푧) if and only if 푊 (푥|푦) > 푊 (푥|푧). 32

푛 for every 푥, 푦, 푧 ∈ F푞 .

To say that 푊 and 푑 are matched, it means there is a decoder that is si- multaneously an MD and an ML decoder, i.e., both the probabilistic and metric criteria coincide. In [19], Firer and Walker proved that the Z channel2, which is a particular case of an asymmetric channel, also has a metric matched to it, however this is a completely theoretical metric and a priori does not provide a better (less complex) model of decoding. Firer and Walker conjectured the possibility to find metrics matching to every asymmetric channel, but for this general case, the matching problem remains unsolved until now. The reciprocal is always true, as we can see in the next proposition.

Proposition 1.3.2. Given a metric 푑, there is a discrete channel matching to 푑.

푛 푛 푛 Proof. Given a metric 푑, for every 푥, 푦 ∈ F푞 , define a channel 푊 : F푞 → 풫F푞 by setting

1/푑(푥, 푦) 푊 (푥|푦) := ∑︀ 푀 + 푦̸=푥 1/푑(푥, 푦) if 푥 ≠ 푦 and 푀 푊 (푥|푥) := ∑︀ 푀 + 푦̸=푥 1/푑(푥, 푦)

푛 where 푀 is any constant satisfying 푀 > 1/푑(푥, 푦) for every 푥, 푦 ∈ F푞 . It is straightforward to conclude that 푑 and 푊 are matched.

The matching relation is not bijective since it is possible to construct channels which do not admit metrics matching to them. Indeed, if 푊 is a channel such that 푛 푊 (푥|푦) < 푊 (푥|푧), 푊 (푧|푥) < 푊 (푧|푦) and 푊 (푦|푧) < 푊 (푦|푥) for some 푥, 푦, 푧 ∈ F푞 , then if 푑 is a metric matching to 푊 , due to the symmetry of 푑, it should satisfy the following relations:

푑(푥, 푦) > 푑(푥, 푧) = 푑(푧, 푥) > 푑(푧, 푦) = 푑(푦, 푧) > 푑(푦, 푥) = 푑(푥, 푦), which is a contradiction. We stress that if 푥, 푦 and 푧 are distinct elements, 푊 (푥), 푊 (푦) and 푊 (푧) are probability distributions which are independently defined.

2The asymmetric metric is commonly used as an MD decoder for the Z channel. Despite the fact that it is not matched to the Z channel, it is used to perform decoding due to its simplicity, details in [9]. 33

For channels that do not admit a metric matching to them, Gabidulin in [20] presented a series of definitions by weakening the definition of matched metrics and channel. One of them introduces an asymptotic definition which characterizes minimum distance decoders that are asymptotically as good as maximum likelihood decoders. Since our goal concerning matching metrics and channels is only to explain the relevance of metrics in this field, we will not go deeper into this subject. To prove the existence of metrics matching to some channels, semimetrics can be used, see [10].

Proposition 1.3.3. [19] If a channel 푊 and a semimetric 푑′ are matched, then, there is a metric 푑 such that 푑 and 푊 are matched. 34

Chapter 2

Metrics Induced by Partially Ordered Sets

In this chapter, we present the main object of this dissertation: the family of poset metrics. Because these metrics are defined by partially ordered sets, some basic properties of partially ordered sets will be explored in order to state some notations. The first section will be devoted to linear codes and properties, which were superficially sketched in the first chapter when describing the syndrome decoding algorithm. Inthe following, all metrics will be considered to be defined by a norm and hence to be invariant by translations. As supplementary readings we suggest the books [27] and [34] and the papers [6], [39] and [12].

2.1 Linear Codes

Linear codes over F푞 (finite field with 푞 elements) are the most common and studied type of code in the literature.

Definition 2.1.1. An [푛, 푘]푞 linear code 풞 over F푞 is a 푘-dimensional linear subspace 푛 풞 ⊂ F푞 . The elements of 풞 will be called codewords.

From here on, we will consider only linear codes, except when otherwise stated. For simplicity in the notations, from now on we will assume that metrics take only natural 푛 values, i.e., 푑(푥, 푦) ∈ N. Since the space F푞 will always be endowed with a metric 푑, codes may be called 푑-codes in order to avoid confusion. 35

푛 Definition 2.1.2. Given a metric 푑 over F푞 , the minimum distance of an [푛, 푘]푞 code 풞 is 훿 = min{푑(푥, 푦): 푥, 푦 ∈ 풞 and 푥 ≠ 푦}.

We say that 풞 is an [푛, 푘, 훿]푞 code.

Since 푑 is invariant by translations, by Proposition 1.2.19, the minimum dis- tance may be alternatively written as

훿 = min{푤(푥): 푥 ∈ 풞 ∖ {0}} where 푤(푥) := 푑(푥, 0) is a norm function and 0 is the null vector.

An [푛, 푘, 훿]푞 linear code 풞 can be represented as the image of an injective 푛 푘 푛 푛−푘 linear map 푇 : F푞 → F푞 or the kernel of a surjective map 푆 : F푞 → F푞 . Considering the canonical , we will name 퐺푇 and 퐻 as the matrices of 푇 and 푆, respectively. Then, 퐺 and 퐻 are known respectively as the generator matrix and the parity check matrix of 풞. In other words:

Definition 2.1.3. A generator matrix 퐺 of an [푛, 푘, 훿] code 풞 is a 푘 × 푛 matrix, of which the 푘 rows form a basis of 풞, then

푘 풞 = {푎퐺 : 푎 ∈ F푞 }.

Definition 2.1.4. A parity check matrix 퐻 of an [푛, 푘, 훿] code 풞 is an (푛 − 푘) × 푛 matrix, satisfying 푐 ∈ 풞 ⇐⇒ 퐻푐푇 = 0푇 , where 0 represents the null vector and 푥푇 is the transpose of the vector 푥.

Example 2.1.5. (Hamming Code) Given 푘 and 푛 = (푞푘 − 1)/(푞 − 1), the [푛, 푛 − 푘, 3]

Hamming code over F푞 is a code defined by the parity check matrix that has columns that are pairwise linearly independent (over F푞), i.e., the set of columns is a maximal set 푛−푘 of pairwise linearly independent vectors in F푞 .

The kernel of an (푛 − 푘) × 푛 parity check matrix is a 푘-dimensional subspace, therefore the of 퐻 is (푛 − 푘) and all its rows are linearly independent, so it can be seen as a generator matrix of a code. 36

Definition 2.1.6. If 풞 is an [푛, 푘, 훿] code with parity check matrix 퐻, the [푛, 푛 − 푘, 훿2] code generated by 퐻 is denoted by 풞⊥ and called the dual code of 풞. If 풞 = 풞⊥, 풞 is said to be a self-dual code.

The rate of an [푛, 푘, 훿] linear code is the quotient 푘/푛, and it represents the amount of information contained in each symbol (coordinate). Considering the Hamming metric (Definition 1.2.9), the fundamental problem of coding theory as suggested byHall in [24], is

Fundamental Problem: find practical codes with reasonable large rate and minimum distance.

For a general metric over F푞, the fundamental problem needs to be reformulated by substituting “minimum distance” by “packing radius”, the true object to be maximized, as we will see. Given a metric 푑, the (closed) d-ball centered at 푥 with radius 푟 is the set of all elements with a distance of at most 푟 from 푥,

푛 퐵푑(푥, 푟) := {푦 ∈ F푞 : 푑(푥, 푦) ≤ 푟}.

Definition 2.1.7. Given a metric 푑, the packing radius of a linear code 풞 is the maximal integer ℛ푑(풞) satisfying

퐵푑(푐1, ℛ푑(풞)) ∩ 퐵푑(푐2, ℛ푑(풞)) = ∅

for every 푐1, 푐2 ∈ 풞 with 푐1 ≠ 푐2. For simplicity, we may suppress the explicit dependence on 푑 in the notation ℛ푑(풞).

The following is a well-known standard result in coding theory.

Proposition 2.1.8. Let 푑퐻 be the Hamming metric. The packing radius of a linear code 풞 is given by ⌊︃훿 − 1⌋︃ ℛ푑 (풞) = 퐻 2 where ⌊푎⌋ denotes the integer part of the real number 푎 and 훿 is the minimum distance of 풞.

The importance of the packing radius in coding theory is due to the fact that it determines the code error-correcting capability, indeed, considering that a codeword 푐 37

is sent over a noisy channel and 푦 is received. If 푑(푐, 푦) ≤ ℛ푑(풞), then the received vector 푦 will still be closer to 푐 than to any other codeword, therefore a minimum distance decoder will output 푐. On the other hand, if 푑(푐, 푦) > ℛ푑(풞) it is not guaranteed that a minimum distance decoder will output 푐. For the Hamming metric, the packing radius of a code is determined by its minimum distance (Proposition 2.1.8), therefore the minimum distance also determines the code error-correcting capability, justifying the description of the fundamental problem of coding given before. As we will see later, there are metrics for which the minimum distance of a code does not determine its packing radius. In order to include this kind of metric, the fundamental problem is restated as follows:

General Fundamental Problem: find practical codes with reasonable large rate and packing radius.

The error-detection capability of an [푛, 푘, 훿] code is 훿−1 since for every received element 푦, if 푑(푐, 푦) ≤ 훿 − 1, by definition of the minimum distance, 푦 will never be a codeword different from 푐. On the other hand, if 푑(푐, 푦) > 훿 − 1 it may happen that 푦 ∈ 풞 and is undetectable. Even when the minimum distance of a code does not determine its packing radius, it is relevant in coding theory since it always determines the code error- detection capability and it provides bounds for the packing radius, as we will see in the next proposition.

Proposition 2.1.9. Let 풞 be an [푛, 푘, 훿] 푑-code over F푞, then

⌊︃훿 − 1⌋︃ ≤ ℛ푑(풞) ≤ 훿 − 1. 2

Proof. The minimum distance definition ensures that ℛ푑(풞) ≤ 훿 − 1. Denote 푡 =

⌊(훿 − 1)/2⌋, we just need to prove that 퐵푑(푢, 푡) ∩ 퐵푑(푣, 푡) = ∅ for all 푢, 푣 ∈ 풞 with

푢 ≠ 푣. Suppose 푥 ∈ 퐵푑(푢, 푡) ∩ 퐵푑(푣, 푡). By the triangular inequality,

푑(푢, 푣) ≤ 푑(푢, 푥) + 푑(푥, 푣) ≤ 2푡 ≤ 훿 − 1, a contradiction since 푑(푢, 푣) ≥ 훿.

As we will see in the poset metrics section, it is possible to construct metrics and codes such that their packing radius reaches the extremal values of the bounds ob- 38 tained in Proposition 2.1.9. Furthermore, in some cases, those bounds may be attained by codes with the same minimum distance.

The packing radius ℛ푑(푐) of a non-null codeword 푐 ∈ 풞 is defined by being the packing radius of the (not necessary linear) code {0, 푐}. The packing radius of a code 풞 can be alternatively defined as the minimal packing radius of its non-null codewords, i.e.,

ℛ푑(풞) = min ℛ푑(푐), (2.1) 푐∈풞* where 풞* = 풞 ∖ {0}. A codeword with minimum packing radius is called a packing vector. We stress that the problem to find the packing radius of a single codeword for general metrics is an NP-hard problem, see [12].

2.1.1 Code Equivalence

In order to investigate the fundamental problem of coding, codes can be gath- ered in classes such that elements in the same class are “geometrically equivalent”. In particular, we are interested in classes of codes having the same error-correcting capabil- ity. A simple way to construct these classes is by using linear isometries. As an example, in the binary Hamming case, two codes belong to the same class if one is a permutation of the other, see [26].

푛 Definition 2.1.10. If F푞 is a metric space endowed with the metric 푑, a linear isometry 푛 푛 (or 푑-linear isometry) 푇 is a linear transformation 푇 : F푞 → F푞 preserving distance, i.e., 푛 for every 푥, 푦 ∈ F푞 , 푑(푇 (푥), 푇 (푦)) = 푑(푥, 푦).

Since we are assuming 푑 is invariant by translations, 푑(푥, 푦) = 푤(푥 − 푦), then 푛 a linear transformation 푇 is an isometry if, and only if, 푤(푇 (푥)) = 푤(푥) for every 푥 ∈ F푞 . 푛 푛 We denote by 퐺퐿푑(F푞 ) the group of all linear isometries of F푞 .

푛 푛 Example 2.1.11. [31] The linear isometry group of F푞 when F푞 is endowed with the

Hamming metric 푑퐻 is the group of all monomial maps, i.e.,

푛 *푛 퐺퐿푑퐻 (F푞 ) ≃ F푞 o 풮푛 where: 39

*푛 • F푞 is isomorphic to the the group of all 푛 × 푛 invertible diagonal matrices. It acts 푛 on F푞 by multiplying each coordinate by a non-zero constant.

• 풮푛 is isomorphic to the group of all permutation matrices and corresponds to the permutations of coordinates in the space.

• the product is semi-direct.

Definition 2.1.12. Two linear codes 풞1 and 풞2 are said to be equivalent if there is a 푛 푛 linear isometry 푇 : F푞 → F푞 such that 푇 (풞1) = 풞2. We denote by 풞1 ∼푑 풞2.

With this definition, considering Example 2.1.11 we obtain, for the Hamming metric case, the usual definition of code equivalence. Equivalence of codes defines an equivalence relation which gather linear codes in classes of codes having same geometrical properties, in particular having the same error-correction capability, however it is not true that codes with the same weight distribution (same number of codewords having weight 푖, for every 푖) belong to the same class.

Example 2.1.13. [27] Let 풞1 and 풞2 be binary codes with generator matrices

⎡ ⎤ ⎡ ⎤ 1 1 0 0 0 0 1 1 0 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 퐺1 = ⎢ 0 0 1 1 0 0 ⎥ and 퐺2 = ⎢ 1 0 1 0 0 0 ⎥ , ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ 0 0 0 0 1 1 1 1 1 1 1 1

푛 respectively. Suppose F푞 is endowed with the Hamming metric. If 퐴푖(풞푗) is the number of codewords in 풞푗 with weight 푖, we have that 퐴0(풞푗) = 퐴6(풞푗) = 1 and 퐴2(풞푗) = 퐴4(풞푗) = 3 for 푗 ∈ {1, 2}. Thus, the codes 풞1 and 풞2 have the same weight distribution consequently, the same minimum distance 훿 = 2 and the same packing radius ℛ푑퐻 (풞1) = ℛ푑퐻 (풞2) = 0. ⊥ Since the basis elements of 풞1 are also elements of 풞1 and these two codes have same dimension, 풞1 is self-dual. But 풞2 is not self-dual since the vector 000011 is a codeword of ⊥ 풞2 but not of 풞2. If 풞1 and 풞2 were equivalent, there should exist a permutation matrix ⊥ ⊥ ⊥ 푃 satisfying 풞1푃 = 풞2, but this would imply that 풞1 푃 = 풞2 and since 풞1 = 풞1 , 풞2 would be self-dual. Therefore, 풞1 and 풞2 are not equivalent.

푛 Given a linear code 풞, we denote by 퐺퐿푑 (풞) its orbit under 퐺퐿푑(F푞 ). Since 푛 ′ 퐺퐿푑(F푞 ) is a group, their orbits are equivalence classes, hence 풞 ∼푑 풞 if, and only if, ′ 퐺퐿푑 (풞) = 퐺퐿푑 (풞 ). The representatives of each class may be chosen according to codes 40 having a generator matrix with a particular form, which is, in general, as simple as possible in order to provide some information about the code.

Proposition 2.1.14. Every [푛, 푘, 훿] linear 푑퐻 -code 풞 is equivalent to a linear 푑퐻 -code with the same parameters having a generator matrix of the form 퐺 = [퐼푘 | 퐴], where 퐼푘 is the identity matrix with order 푘 and 퐴 is a 푘 × (푛 − 푘) matrix.

This form is possible since permutations and non-null scalar multiplications in the columns of the generator matrix are performed by linear isometries of the Hamming metric. Due to Proposition 2.1.14, it is common to find in the literature the assumption that every code has a generator matrix in the standard form 퐺 = [퐼푘 | 퐴]. As consequence of the standard form, we get an easy way to obtain the parity check matrix of a 푑퐻 -code.

Theorem 2.1.15. If 퐺 = [퐼푘 | 퐴] is a generator matrix for an [푛, 푘, 훿] code 풞, then 푇 퐻 = [−퐴 | 퐼푛−푘] is a parity check matrix for 풞.

2.2 Partially Ordered Sets

Partial orders will be the main mathematical structures used in this work to define poset metrics. The definitions in this section are mainly to fix notations forthe development of the next sections. As complementary readings, the book [34] and the papers [5] and [14] are indicated. Let 푋 and 푌 be finite non-empty sets. A binary relation over 푋 and 푌 is any subset 푅 of the product 푋 × 푌 . If (푥, 푦) ∈ 푅, we write 푥푅푦. If 푋 = 푌 , we say that 푅 is a binary relation over 푋.

Definition 2.2.1. A partial order relation in a set 푋 is a binary relation, usually denoted by 6, satisfying, for every 푥, 푦, 푧 ∈ 푋, the following conditions:

(a) 푥 6 푥 (reflexivity);

(b) If 푥 6 푦 and 푦 6 푥, then 푥 = 푦 (anti-symmetry);

(c) If 푥 6 푦 and 푦 6 푧, then 푥 6 푧 (transitivity).

If 6 is a partial order relation over 푋, the pair 푃 = (푋, 6) is called poset.

Eventually, the binary relation 6 of the poset 푃 = (푋, 6) is denoted by 6푃 . By an abuse of notation, the poset 푃 will be identified with 푋. Therefore, elements of 41

푋 will be considered to be also elements of 푃 and the order relation over 푋 will also be regarded to be an order relation over 푃 . If any two elements of a partial order are comparable, the order is named total and the poset is called a chain. An anti-chain is a poset where distinct elements are never comparable. We say that 푦 covers 푥 if 푥 6 푦 and there is no extra element 푧 such that 푥 6 푧 and 푧 6 푦. If 푦 covers 푥, the pair (푥, 푦) is said a covering pair. In order to geometrically describe a poset, the Hasse diagram is used.

Definition 2.2.2. The Hasse diagram of a poset 푃 = (푋, 6) is the directed graph in which the vertex set is 푋 and whose arcs are the covering pairs (푥, 푦) in the poset.

We usually draw the Hasse diagram of a poset in the plane in such a way that, if 푦 covers 푥, then the point representing 푦 is higher than the point representing 푥. No arrows are required in the drawing, since the directions of the arrows are implicit downward.

Example 2.2.3. Let 푋 = {1, 2, 3, 4, 5, 6, 7} and consider the following partial order re- lation: 1 6 2, 1 6 3, 4 6 3, 3 6 6 and 5 6 6. The Hasse diagram of 푃 = (푋, 6) is given by

6

2 3

7 1 4 5

Definition 2.2.4. An ideal in a poset 푃 is a nonempty subset 퐼 ⊂ 푋 such that, for 푖 ∈ 퐼 and 푗 ∈ 푋, if 푗 6푃 푖 then 푗 ∈ 퐼.

Given 퐴 ⊂ 푋, we denote by ⟨퐴⟩푃 the smaller ideal of 푃 containing 퐴. If

퐴 = {푖}, we denote by ⟨푖⟩푃 the ideal ⟨{푖}⟩푃 . The set 푀푎푥푃 (퐴) is the set of all maximal elements of 퐴 in 푃 , equivalently,

푀푎푥푃 (퐴) = {푖 ∈ 퐴 : 푖 푃 푗 for all 푗 ∈ 퐴 ∖ {푖}}.

The rank of an element 푗 ∈ 푋, denoted by ℎ푃 (푗), is the maximal cardinality of a chain 42 contained in ⟨푗⟩,

ℎ푃 (푗) = 푚푎푥{|퐶| : 퐶 ⊂ ⟨푗⟩푃 and 퐶 is a chain}.

The height ℎ(푃 ) of 푃 is the maximal rank among the elements of 푋. The 푖-level of 푃 is 푖 the set of all elements with rank 푖, Γ푃 := {푗 ∈ 푋 : ℎ푃 (푗) = 푖}. The level distribution of 1 ℎ(푃 ) a poset 푃 is the vector (Γ푃 ,..., Γ푃 ). This distribution defines a partition of 푋 in the ᨀ 푖 푖 sense that 푋 = Γ푃 . Also, since the levels are disjoint, if |푋|= 푛 and |Γ푃 |= 푛푖, then 1 ℎ(푃 ) 푛 = 푛1 + ··· + 푛ℎ(푃 ). The level enumerator of a poset is the array (|Γ푃 |,..., |Γ푃 |).

Example 2.2.5. The level distribution of the poset defined in Example 2.2.3 is the vector with length 3 whose coordinates are given by

1 2 3 Γ푃 = {1, 4, 5, 7}, Γ푃 = {2, 3} and Γ푃 = {6}.

Furthermore, ⟨6⟩ = {6, 3, 1, 4, 5} and ⟨{2, 7}⟩ = {2, 7, 1}.

In coding theory, an important family of posets is the family of Hierarchical posets, as we shall see, the metrics obtained by these posets can be considered as a “true” generalization of the Hamming metric, see [30] for more details concerning this relation.

푖 푗 Definition 2.2.6. Given a poset 푃 , if 푥 6푃 푦 for every 푥 ∈ Γ푃 and 푦 ∈ Γ푃 with 푖 < 푗, then 푃 is called a hierarchical poset.

Example 2.2.7. The poset 푃 constructed in Example 2.2.3 is not hierarchical since 7 and 2 belong to different levels and are not comparable to each other. By suitably adding relations until we can obtain a hierarchical poset. The resulting poset has the following Hasse diagram: 6

2 3

7 1 4 5

Posets can be characterized by their Hasse diagrams. This characterization is obtained by using the equivalence relation defined by isomorphism of posets. This 43 relation ensures that posets with the same unlabeled Hasse diagram are essentially the same poset.

Definition 2.2.8. Let 푃 = (푋, 6푃 ) and 푄 = (푌, 6푄) be two partially ordered sets. A map 푓 : 푋 → 푌 is called order-preserving if 푥 6푃 푦 implies 푓(푥) 6푄 푓(푦).

Definition 2.2.9. Let 푃 = (푋, 6푃 ) and 푄 = (푌, 6푄) be two posets such that 푌 ⊂ 푋. The set 푄 is a subposet of 푃 if the identity map 퐼 : 푌 → 푋 is an order-preserving map.

The example below shows that a bijective and order-preserving map not always has an inverse which also preserve order.

Example 2.2.10. Let 푃 be an anti-chain over [푛] and 푄 any other poset over [푛]. Then, any bijective map from 푃 to 푄 is an order-preserving map with an inverse that does not preserve order.

Definition 2.2.11. A one-to-one order-preserving map 푓 from a poset (푋, 6푃 ) onto −1 a poset (푌, 6푄) is called an isomorphism if the inverse 푓 is also an order-preserving mapping. An isomorphism from a poset to itself is called an automorphism.

Whenever two posets are order isomorphic, they can be considered to be es- sentially the same in the sense that one of the orders can be obtained from the other just by renaming of elements.

Proposition 2.2.12. [34] Two hierarchical posets are isomorphic if, and only if, they have the same level enumerator.

Due to Proposition 2.2.12, if 푃 is hierarchical, it will be denoted by (푛 : 푗 푛1, . . . , 푛푠) where 푛푗 = |Γ푃 | and 푠 = ℎ(푃 ). The set of all automorphisms of a poset 푃 is a group which will be denoted by 퐴푢푡(푃 ). * * Let 풫푛 the set of all posets over [푛]. The set 풫푛 has a natural partial order: * given two posets 푃, 푄 ∈ 풫푛, we say that 푃 is finer (or smaller) than 푄 (and write 푃 ≤ 푄) if 푖 6푃 푗 implies 푖 6푄 푗. Equivalently, 푃 ≤ 푄 if, and only if, the identity map is an order- * preserving map from 푃 to 푄. With this relation, the set 풫푛 is itself a partially ordered set. The trivial order (anti-chain: 푖 6 푗 ⇐⇒ 푖 = 푗) is the (unique) minimal element in * 풫푛 and the linear order (chain: 1 6 2 6 ··· 6 푛), as much as its 푛! permutations, are the * maximal elements in 풫푛. 44

* 푟1 푟2 A poset 푃 ∈ 풫푛 is said to be naturally labeled if for every 푖 ∈ Γ푃 and 푗 ∈ Γ푃 with 푟1 < 푟2, then 푖 < 푗 (where < is the natural order over N). Therefore, if the set 풫푛 denote the set of all naturally labeled posets,

* 풫푛 = {푃 ∈ 풫푛 : 푃 is naturally labeled}

풫푛 has a unique maximal element, the linear order defined by 1 6 2 6 ··· 6 푛.

1 8 9 7 8 9

4 2 6 4 5 6

3 5 7 1 2 3 non-natural labelling natural labelling

* Since given a poset 푃 ∈ 풫푛 there is always a poset 푄 ∈ 풫푛 order isomorphic to 푃 , from now on, assume all the posets to be naturally labeled.

2.3 Poset Metrics

푛 Classical coding theory may be considered as the study of F푞 when it is en- dowed with the Hamming metric. To generalize the classical problems in coding theory, Niederreiter made the initial progress in [37], [35] and [36] by introducing non-Hamming 푛 metrics in F푞 . Generalizing the metric introduced by Niederreiter, in [6], Brualdi et al. introduced a new large family of metrics, the so-called poset metrics. In the following, consider 푃 to be a poset over [푛].

푛 Definition 2.3.1. The 푃 -weight of a vector 푥 ∈ F푞 is defined as the cardinality of the smallest ideal of 푃 containing 푠푢푝푝(푥), i.e.,

푤푃 (푥) = |⟨푠푢푝푝(푥)⟩푃 |.

푛 It is clear that 푤푃 (푥) ≥ 0 for every 푥 ∈ F푞 and 푤푃 (푥) = 0 if and only if 푥 = 0.

Also, the relations 푠푢푝푝(푥 + 푦) ⊂ 푠푢푝푝(푥) ∪ 푠푢푝푝(푦) and ⟨퐴 ∪ 퐵⟩푃 = ⟨퐴⟩푃 ∪ ⟨퐵⟩푃 imply 푛 that 푤푃 (푥 + 푦) ≤ 푤푃 (푥) + 푤푃 (푦). Therefore, 푤푃 is a norm function over F푞 . 45

푛 The support of a set 푋 ⊂ F푞 is the union of the supports of the elements of 푋, 푠푢푝푝(푋) = {푖 : 푖 ∈ 푠푢푝푝(푥) for some 푥 ∈ 푋}.

푛 Definition 2.3.2. The 푃 -distance in F푞 is the (invariant by translation) metric induced by 푤푃 ,

푑푃 (푥, 푦) = 푤푃 (푥 − 푦).

Example 2.3.3 (Niederreiter-Rosenbloom-Tsfasman metric - NRT). Given two integers 푛 and 푟 such that 푟 divides 푛, an (푛, 푟)-NRT metric is a poset metric induced by a poset formed by 푛/푟 disjoint chains, each chain having length 푟. Suppose 푛 = 12 and 푟 = 3.

Consider the NRT weight 푤푁푅푇 defined by the following Hasse diagram: 9 10 11 12

5 6 7 8

1 2 3 4

In particular, the (푛, 푛)-NRT and the (푛, 1)-NRT metrics are the chain and the anti-chain (or the Hamming) metrics.

2.3.1 Group of Linear Isometries for Poset Metrics

As seen in section 2.1.1, two linear codes are equivalent if, and only if, there is a linear isometry mapping one into the other. For a poset metric, the group of linear isometries was first determined for the Rosenbloom–Tsfasman space in [8] and forthe crown space in [29]. The description of this group for a general poset metric was presented in [39]. We will describe it in some details since both the result and the approach used in the proof will be used in Chapter 3. The proof presented here is slightly different but follows the same idea of the proof given in [39]. Let 푀푛(F푞) be the set of all 푛×푛 matrices over F푞 and

퐺푃 := {퐴 = (푎푖푗) ∈ 푀푛(F푞): 푎푖푗 = 0 if 푖 푗 and 푎푖푖 ≠ 0}. (2.2)

Considering the usual basis 훽 = {푒1, . . . , 푒푛} where 푠푢푝푝(푒푖) = {푖} and the 푖-th entry is 1. Each matrix 퐴 ∈ 퐺푃 defines a linear map 푇퐴. The set of such maps will be 46

풢 푇 ∈ 풢 푇 푒 ∑︀ 푎 푒 푎 ̸ denoted by 푃 . By definition, 푃 if, and only if, ( 푗) = 푖6푃 푗 푖푗 푖 with 푗푗 = 0 for every 푗 ∈ [푛].

Note initially that an automorphism 휑 ∈ 퐴푢푡(푃 ) induces an isometry 푇휑 ∈ 푛 푛 퐺퐿푃 (F푞 ), acting on F푞 by permutation of the coordinates: 푇휑(푥1, . . . , 푥푛) = (푥휑(1), . . . , 푥휑(푛)). The set of these isometries will be denoted by 풜푢푡(푃 ).

푛 푛 Theorem 2.3.4. The group of isometries of F푞 is the semi-direct product 퐺퐿푃 (F푞 ) =

풢푃 o 풜푢푡(푃 ).

The next two lemmas will be used in order to produce maximal decompositions in Chapter 3, a generalization of the canonical decomposition obtained in [15]. They are also used to prove Theorem 2.3.4, their proofs can be found in [39].

푛 Lemma 2.3.5. If 푇 ∈ 퐺퐿푃 (F푞 ), then the map 휑푇 : 푃 → 푃 given by

휑푇 (푖) = 푚푎푥⟨푠푢푝푝(푇 (푒푖)⟩푃 is an automorphism of 푃 .

푛 Lemma 2.3.6. The linear transformation 푇 is an element of 퐺퐿푃 (F푞 ) if, and only if,

∑︁ 푇 (푒푗) = 푥푖푗푒휑푇 (푖), (2.3) 푖6푃 푗 where 휑푇 is the automorphism associated with 푇 as in Lemma 2.3.5 and 푥푖푗 are constants with 푥푗푗 ≠ 0 for all 푗 ∈ [푛].

푛 Proof of Theorem 2.3.4. It is clear that 풜푢푡(푃 ) is a subgroup of 퐺퐿푃 (F푞 ) and the char- acterization of 풢푃 together with Lemma 2.3.6 ensures that 풢푃 is also a subgroup of 푛 푛 ∑︀ 퐺퐿푃 (F푞 ). Given 푇 ∈ 퐺퐿푃 (F푞 ), by Lemma 2.3.6, 푇 (푒푗) = 푖6푗 푥푖푗푒휑푇 (푖) and 푥푗푗 ≠ 0. ′ ′ ′ 푛 ′′ Consider 푇 defined by 푇 (푒푖) = 푒휑푇 (푖), by Lemma 2.3.6, 푇 ∈ 퐺퐿푃 (F푞 ). Define 푇 by ′′ ∑︀ ′′ ′ ′ setting 푇 (푒푗) = 푖6푗 푥푖푗푒푖, thus 푇 (푒푗) = 푇 ∘푇 (푒푗). Since 푇 is obtained by the automor- ′ ′′ phism 휑푇 , it follows that 푇 ∈ 풜푢푡(푃 ), furthermore, by construction, 푇 ∈ 풢푃 . Therefore, 푛 푛 퐺퐿푃 (F푞 ) = 풢푃 · 풜푢푡(푃 ). In order to prove that 풢푃 is a normal subgroup of 퐺퐿푃 (F푞 ), ′ ′−1 ′ 푛 it is enough to show that 푇 ∘ 푇 ∘ 푇 ∈ 풢푃 for every 푇 ∈ 풢푃 and 푇 ∈ 퐺퐿푃 (F푞 ). ′ ′′ ′′ Since 푇 = 푇 ∘ 푇2, where 푇 ∈ 풢푃 and 푇2 ∈ 풜푢푡(푃 ), it is sufficient to prove that −1 푇2 ∘ 푇 ∘ 푇2 ∈ 풢푃 for every 푇2 ∈ 풜푢푡(푃 ). First, note that 푇2 = 푇휑 for some 휑 ∈ 퐴푢푡(푃 ) 47

−1 and that 푇휑 = 푇휑−1 , thus

⎛ ⎞ ∑︁ ∑︁ 푇휑 ∘ 푇 ∘ 푇휑−1 (푒푗) = 푇휑 ∘ 푇 (푒휑−1(푗)) = 푇휑 ⎝ 푥푖휑−1(푗)푒푖⎠ = 푥푖휑−1(푗)푒휑(푖). 푖6휑−1(푗) 푖6휑−1(푗)

−1 Since 푖 6 휑 (푗) implies 휑(푖) 6 푗, denoting 푏휑(푖)푗 = 푥푖휑−1(푗), it follows that

∑︁ 푇휑 ∘ 푇 ∘ 푇휑−1 (푒푗) = 푏휑(푖)푗푒휑(푖). 푖 : 휑(푖)6푗

푛 Therefore 푇휑 ∘ 푇 ∘ 푇휑−1 ∈ 풢푃 and 풢푃 is a normal subgroup of 퐺퐿푃 (F푞 ). By using the characterizations of 풢푃 and 풜푢푡(푃 ), it is straightforward to conclude that 풢푃 ∩풜푢푡(푃 ) = 푛 {퐼} where 퐼 is the identity map, therefore 퐺퐿푃 (F푞 ) is isomorphic to the semidirect product

풢푃 o 풜푢푡(푃 ).

Example 2.3.7. (Hierarchical Case) We remark that if 푇 ∈ 풜푢푡(푃 ), then 푇 is induced by a permutation 휑 : 푃 → 푃 and denoted by 푇휑. If 푃 is an (푛 : 푛1, . . . , 푛푠) hierarchical poset, the image by 휑 of an element in the 푖-th level must also belong to this level. For each 푖 푖, we denote the group 풜푢푡(Γ푃 ) by the group of all linear maps induced by permutations

휑푖 that permutes only elements of the 푖-th level of 푃 , i.e., 휑푖 : 푃 → 푃 is a bijection 푖 푖 satisfying 휑푖(푗) = 푗 if 푗 ̸∈ Γ푃 . Since 푃 is hierarchical, 풜푢푡(Γ푃 ) ⊂ 풜푢푡(푃 ). Hence, each

휑푖 induces an isometry 푇휑푖 thus 휑 = 휑1 ∘ ... ∘ 휑푠, i.e., 푇휑 = 푇휑1 ∘ ... ∘ 푇휑푠 . Therefore, the 1 푠 group 풜푢푡(푃 ) is isomorphic to the product 풜푢푡(Γ푃 ) × ... × 풜푢푡(Γ푃 ) and 푇 ∈ 풜푢푡(푃 ) if, and only if,

푇휑(푥1, . . . , 푥푛) = (푥휑1(1), . . . , 푥휑1(푛1)) × · · · × (푥휑푠(푛1+...+푛푠−1+1), . . . , 푥휑푠(푛1+...+푛푠−1+푛푠))

푖 where 휑푖 ∈ 풜푢푡(Γ푃 ) for every 푖 ∈ {1, . . . , 푠}.

2.3.2 Packing Radius of Poset Codes

Since poset metrics are defined by weights, they are invariant by translations and the packing radius of a poset code can be equivalently defined as being the largest integer ℛ푃 (풞) satisfying

퐵푃 (0, ℛ푃 (풞)) ∩ 퐵푃 (푐, ℛ푃 (풞)) = ∅ 48 for every 푐 ∈ 풞. In order to simplify the notation, the set of maximal elements in the support of a codeword 푐, which is denoted by 푀푎푥푃 (푠푢푝푝(푐)), will be denoted by 푀푎푥푃 (푐).

Proposition 2.3.8. Given an [푛, 푘, 훿]푞 poset code 풞 such that |푀푎푥푃 (푐)|= 1 for every 푐 ∈ 풞 ∖ {0}, then

ℛ푃 (풞) = 훿 − 1.

Proof. Suppose 푀푎푥푃 (푐) = {푖}, note that if 푥 ∈ 퐵푃 (푐, 푤푃 (푐) − 1) then 푥푖 = 푐푖. Since

푖 ∈ 푠푢푝푝푃 (푥), it follows that 푥 ̸∈ 퐵푃 (0, 푤푃 (푐) − 1). Hence, 푅푃 (푐) = 푤푃 (푐) − 1. Taking 푐 with minimum weight 훿, the packing radius of this codeword is 훿 − 1, by characterization

2.1, 푐 is a packing vector, therefore, ℛ푃 (풞) = 훿 − 1.

The proof of the next corollary follows straight from the fact that in a chain, every non-null vector has only one maximal element in its support.

Corollary 2.3.9. Let 푃 be a chain. If 풞 is an [푛, 푘, 훿]푞 푃 -code, then

ℛ푃 (풞) = 훿 − 1.

Corollary 2.3.9 and Proposition 2.1.8 ensure that the bounds given in Proposi- tion 2.1.9 are tight. The extremal values for the bound are obtained by extremal posets, one with no relations (anti-chain) and the other with the maximum number of relations (total order - chain). Both are hierarchical posets and the metrics induced by these posets are very well understood, as can be seen in [30]. A hierarchical poset can be obtained by summing up anti-chains, as follows:

Definition 2.3.10. Let 푃 = (푋, 6푃 ) and 푄 = (푌, 6푄) be two posets with 푋 ∩ 푌 = ∅. The ordinal sum of 푃 and 푄 is the poset 푃 ⊕ 푄 with order relation given by

⎧ ⎪ 푥 푃 푦 when 푥, 푦 ∈ 푋 ⎪ 6 ⎨⎪ 푥 6푃 ⊕푄 푦 ⇐⇒ 푥 6푄 푦 when 푥, 푦 ∈ 푌 . ⎪ ⎪ ⎩⎪ 푥 ∈ 푋 and 푦 ∈ 푌

Proposition 2.3.11. A hierarchical poset (푛 : 푛1, . . . , 푛푙) is an ordinal sum of 푙 anti- chains. 49

It is straightforward to conclude that the Hamming metric is a poset metric induced by an anti-chain. According to Proposition 2.3.11, hierarchical posets are charac- terized as ordinal sums of anti-chains, therefore, hierarchical metrics are closely related to the Hamming metric. We are particularly interested in the characterization of poset met- rics for which the error-correction capability of a code is determined by its error-detection capability, i.e., poset metrics for which the packing radius of a code is determined by its minimum distance.

Proposition 2.3.12. If 풞 is an [푛, 푘, 훿]푞 푃 -code such that ⟨푠푢푝푝(풞)⟩푃 is a hierarchical subposet of 푃 , then

⌊︃ ⌋︃ 훿 − (푛1 + ··· + 푛푟−1) − 1 ℛ푃 (풞) = 푛1 + ··· + 푛푟−1 + 2

푟 where 푟 is the smallest level of 푃 such that there exists 푐 ∈ 풞 with 푀푎푥푃 (푐) ⊂ Γ푃 and 푠 푛푠 = |⟨푠푢푝푝(풞)⟩푃 ∩ Γ푃 |.

Proof. Set 푠1 = 0 and 푠푖 = 푛1 + ··· + 푛푖−1 for all 푖 ∈ {2, . . . , ℎ(푃 )}. Consider ℎ =

푠푟 + ⌊(훿 − 푠푟 − 1)/2⌋ where

푖 푟 = min{푖 : 푀푎푥푃 (푐) ⊂ Γ푃 for some 푐 ∈ 풞}.

Suppose there exists 푧 ∈ 퐵푃 (0, ℎ) ∩ 퐵푃 (푐, ℎ) for some 푐 ∈ 풞. Since ⟨푠푢푝푝(풞)⟩푃 is hierar-

푗0 chical, 푀푎푥푃 (푐) ⊂ Γ푃 for some level 푗0 and by the minimality of 푟, it follows that 푗0 ≥ 푟.

The weight of 푐 is of the form 푤푃 (푐) = 푠푗0 + 푡 where 푡 = |푀푎푥푃 (푐)|. It is straightforward

푗0 that 푀푎푥푃 (푧 − 푐) ⊂ Γ푃 . Thus, if

⌊︂푡 − 1⌋︂ |푀푎푥푃 (푧 − 푐)|> , 2 then ⌊︂푡 − 1⌋︂ 푑(푧, 푐) = 푠푗 + |푀푎푥푃 (푧 − 푐)|> 푠푗 + ≥ ℎ, 0 0 2 i.e., 푧 ̸∈ 퐵푃 (푐, ℎ). On the other hand, if

⌊︂푡 − 1⌋︂ |푀푎푥푃 (푧 − 푐)|≤ , 2 50 then

2|푀푎푥푃 (푧 − 푐)|< 푡 = |푀푎푥(푐)|.

Therefore, |푀푎푥푃 (푧)|> |푀푎푥푃 (푐)|, hence

푑(푧, 0) = 푠푗0 + |푀푎푥푃 (푧)|> 푠푗0 + |푀푎푥푃 (푐)|≥ ℎ,

i.e., 푧 ̸∈ 퐵푃 (0, ℎ). To conclude, we just need to prove that there exists 푐 ∈ 풞 such that

퐵푃 (0, ℎ + 1) ∩ 퐵푃 (푐, ℎ + 1) ≠ ∅. Suppose 푤푃 (푐) = 훿, note that |푀푎푥푃 (푐)|= 훿 − 푠푟, take 푛 a set 퐴 ⊂ 푀푎푥푃 (푐) such that |퐴|= ⌊(훿 − 푠푟 − 1)/2⌋ + 1, define 푧 ∈ F푞 by the rule 푧푖 = 푐푖 for every 푖 ∈ 퐴 and 푧푖 = 0 otherwise, then

푑푃 (푧, 0) = |퐴|+푠푟 = ℎ + 1.

Since |푀푎푥푃 (푐)∖퐴|≤ |퐴|, we also have that

푑푃 (푧, 푐) = 푠푟 + |푀푎푥푃 (푐)∖퐴|≤ 푠푟 + |퐴|= ℎ + 1.

Thus, 푧 ∈ 퐵푃 (0, ℎ + 1) ∩ 퐵푃 (푐, ℎ + 1), therefore, ℛ푃 (풞) = ℎ.

We stress that 푃 is hierarchical if, and only if, every ideal of 푃 is also hierar- chical and Proposition 2.3.12 holds for every code.

Theorem 2.3.13. The poset 푃 is a hierarchical poset if and only if any two 푃 -codes with same minimum distance also have the same packing radius.

Proof. Proposition 2.3.12 ensures that if 푃 is hierarchical any two codes having the same minimum distance also have the same packing radius. Conversely, suppose 푃 is not 푟 푟+1 hierarchical, then there exist 푖0 ∈ Γ푃 and 푗0 ∈ Γ푃 such that 푖0 푗0. Let 푟 be minimal with this property. Consider the one-dimensional code 풞1 generated by the vector 푒푗0 and the one-dimensional code 풞2 generated by the vector 푣 where 푣푠 = 1 if 푠 ∈ 퐴, 푣푖0 = 1 and 푟 푣푠 = 0 for all 푠 ∈ [푛] ∖ (퐴 ∪ {푖0}) where 퐴 = Γ푃 ∩ ⟨푗0⟩. By construction, 풞1 and 풞2 have the same minimum distance 훿 = |⟨푗0⟩푃 |. By Proposition 2.3.8, ℛ푃 (풞1) = 훿 − 1 and by 51

Proposition 2.3.12,

⌊︃ ⌋︃ 훿 − (푛1 + ··· + 푛푟−1) − 1 ℛ푃 (풞2) = 푛1 + ··· + 푛푟−1 + , 2 but

ℛ푃 (풞1) =훿 − 1 = 푛1 + ··· + 푛푟−1 + 훿 − (푛1 + ··· + 푛푟−1) − 1 (2.4) ⌊︃ ⌋︃ 훿 − (푛1 + ··· + 푛푟−1) − 1 >푛1 + ··· + 푛푟−1 + = ℛ푃 (풞2). (2.5) 2

Example 2.3.14. Consider the non-hierarchical poset 푃 over [3] with an order relation given by 1 6 3. Its Hasse diagram is as follows: 3

1 2

푛 Let 풞1 and 풞2 be two one-dimensional linear codes over F푞 generated by 푒3 and 푒1 + 푒2, respectively. It is straightforward to conclude that both codes have minimum distance

(훿 = 2) and that their packing radius are different, namely, ℛ푃 (풞1) = 1 and ℛ푃 (풞2) = 0.

We conclude that among the poset metrics, hierarchical metrics are the only ones where the error-correcting capability of a code is determined by its error-detection capability. We remark that, as proved in [12], to determine the error-correction capability of a single pair of codewords is an NP-hard problem, so bounds for these invariants are very welcome. 52

Chapter 3

Canonical Form

It is well known that representation of a code by generator matrices is not unique. It depends on the choice of the code basis. The election of a particular represen- tation is motivated by the fact that it can be useful for determining some code parameters.

For example, the standard form for a 푑퐻 -code provides an easy way to obtain the parity check matrix of this code and allows the characterization of MDS codes and the singleton defect in general, see [45]. The form is called standard because every code has, up to equivalence, a unique such representation. Analogous representations of the standard form were presented in [1] and [15] for codes over vector spaces endowed with NRT and hierarchical metrics, respectively. A standard form is obtained by the choice of an appropriate basis of the code and eventually by reordering the columns. The choice of a basis is obtained operating with the rows of a generating matrix. When considering poset metrics, it may be permitted to perform some operations (other than permutations) with the columns. Those permitted operations are determined by the group of linear isometries. In the referred cases of NRT and hierarchical posets, by operating with the columns we can get a generating matrix (or equivalently, a basis) that in many senses may be called canonical. The canonical matrix determines a canonical decomposition of a code into a direct sum of sub-codes with dimension and support uniquely prescribed by the code weight hierarchy. For a general poset, there is no such canonical decomposition, but there is a decomposition that is maximal in the number of components. In the sequence we will first present the canonical form originally presented in [15] and then proceed to define and determine what we call by a maximal 푃 -decomposition. This decomposition is what allows to produce bounds for invariants 53 that cannot be related or produced by explicit expressions.

3.1 Canonical Form for Hierarchical Metrics

In [15], it was proposed a canonical-systematic form for codes over spaces endowed with a hierarchical poset metric. In this section, we will present an alternative way to obtain the canonical-systematic form emphasizing the role of the basis of the code instead of the generating matrix. When using the generator matrix, the focus is on the form of the matrix, here, an particular form may be obtained in some cases by the choice of an appropriate basis which is determined by a decomposition. Consequently, a particular form for the matrix is not the main goal, the goal is to obtain a generator matrix as simple as possible. 푛 Let 푃 be a hierarchical poset over [푛]. Given 푤 ∈ F푞 , the 푃 -clean of 푤 is the vector 푤̃︀ with the same value of 푤 in the coordinates that are maximal in the support of 푤 and zero in the remaining coordinates, namely: 푤̃︀푖 = 푤푖 if 푖 ∈ 푀푎푥(푤) and

푤̃︀푖 = 0 otherwise. Note that the 푃 -clean of 푤 has the same 푃 -weight of 푤. The 푗-th (푗) 푛 projection of 푤 is the vector 푤 ∈ F푞 obtained by the projection of 푤 into the coordinates (푗) 푗 (푗) corresponding to the 푗-th level of 푃 , i.e., 푤 푖 = 푤푖 if 푖 ∈ Γ푃 and 푤 푖 = 0, otherwise. 푛 In the following lemma and theorem, if 푤 ∈ F푞 , ⟨푤⟩ denotes the one-dimensional space generated by 푤 (not to be confused with the ideal generated by the support of 푤).

Lemma 3.1.1. Let 푃 be a hierarchical poset with 푠 levels over [푛] and 풞 be a 푙-dimensional

푖0 푛 code satisfying 푠푢푝푝(풞) ⊂ Γ푃 for some 푖0 ∈ {1, . . . , 푠}. Given 푤 ∈ F푞 such that 푀푎푥(푤) ⊂

푖0 ′ ′ ′ 푖0 Γ푃 and 푤̃︀ ̸∈ 풞, there exists a linear code 풞 such that 풞 ∼푃 풞 ⊕ ⟨푤⟩ and 푠푢푝푝(풞 ) ⊂ Γ푃 .

Proof. Let {푣1, . . . , 푣푙} be a basis of 풞. Given 푤 as in the statements of the lemma, since 1 푙 푤̃︀ ̸∈ 풞, the set 훽 = {푤,̃︀ 푣 , . . . , 푣 } is linearly independent. So, 훽 is a basis for the linear code 풞 ⊕ ⟨푤̃︀⟩. Let 1 푙 훽1 = {푤,̃︀ 푣 , . . . , 푣 , 푒푗1 , . . . , 푒푗푛−푙−1 }

푛 be a basis of F푞 obtained by extending the basis 훽 using vectors of the canonical basis. ∑︀푛−푙−1 Because 푤 = 푤̃︀ + 푖=1 훼푖푒푗푖 for suitable scalars 훼푖, the set

1 푙 훽2 = {푤, 푣 , . . . , 푣 , 푒푗1 , . . . , 푒푗푛−푙−1 } 54

푛 generates F푞 . Furthermore, note that 훽2 contains a basis of 풞 ⊕ ⟨푤⟩. Define the linear 푖 푖 map 푇 by setting 푇 (푤) = 푤̃︀, 푇 (푣 ) = 푣 for all 푖 ∈ {1, . . . , 푙} and 푇 (푒푗푟 ) = 푒푗푟 for all

′ ′ 푖0 푟 ∈ {1, . . . , 푛−푙−1}. Denoting by 풞 the code generated by 훽, we have that 푠푢푝푝(풞 ) ⊂ Γ푃 and 푇 (풞 ⊕ ⟨푤⟩) = 풞′. To conclude, we just need to prove that 푇 is an isometry. Writing 푛 a canonical vector 푒푗 ∈ F푞 according to 훽2 where 푗 ̸∈ {푗1, . . . , 푗푛−푙−1}, we get that

푙 푛−푙−1 ∑︁ 푖 ∑︁ 푒푗 = 훼푤 + 훾푖푣 + 휃푖푒푗푖 , 푖=1 푖=1

∑︀푛−푙−1 푟 where 훼, 훾푖, 휃푖 ∈ F푞. Since 푤 = 푤̃︀ + 푖=1 훼푖푒푗푖 and 훼푖 = 0 if 푗푖 ∈ Γ푃 with 푟 ≥ 푖0 (푤̃︀ is the 푃 -clean of 푤), then

푙 푛−푙−1 ∑︁ 푖 ∑︁ 푇 (푒푗) = 훼푤̃︀ + 훾푖푣 + 휃푖푒푗푖 푖=1 푖=1 (︃ 푛−푙−1 )︃ 푙 푛−푙−1 푛−푙−1 ∑︁ ∑︁ 푖 ∑︁ ∑︁ = 훼 푤 − 훼푖푒푗푖 + 훾푖푣 + 휃푖푒푗푖 = 푒푗 − 훼훼푖푒푗푖 . 푖=1 푖=1 푖=1 푖=1

푟 Due to the fact that 푃 is hierarchical and 훼푖 = 0 for every 푖 such that 푗푖 ∈ Γ푃 with 푟 ≥ 푖0, the characterization given in Lemma 2.3.6 ensures that 푇 is a linear isometry.

Observation 3.1.2. The linear isometry 푇 constructed in the previous lemma coincides with the identity map when restricted to elements whose support does not intercept the

푖0 level 푖0, i.e., 푇 (푥) = 푥 if 푠푢푝푝(푥) ⊂ [푛] ∖ Γ푃 .

Theorem 3.1.3. Let 푃 be an (푛 : 푛1, . . . , 푛푠) hierarchical poset. If 풞 is a linear 푃 -code, there exists 풞̃︀ such that 풞̃︀ ∼푃 풞,

풞̃︀ = 풞1 ⊕ 풞2 ⊕ · · · ⊕ 풞푠

푖 and 푠푢푝푝(풞푖) ⊂ Γ푃 for every 푖 ∈ [푠].

푛 Proof. If 풞 is an one-dimensional code, then 풞 = ⟨푣⟩ for a suitable 푣 ∈ F푞 . Considering 풞′ = {0} to be the null vector space, since 푣∈ / 풞′, Lemma 3.1.1 ensures that 풞 is equivalent to the code generated by the 푃 -clean of 푣. Suppose 풞 is a 푘-dimensional code and 푘 > 1. Then 풞 = ⟨푤⟩ ⊕ 풞′ for some 푤 ∈ 풞 and a (푘 − 1)-dimensional code 풞′ ⊂ 풞. By induction, ′ ′ ′ 푠 푖 ′ there is a linear isometry 푇 such that 푇 (풞 ) = ⊕푖=1퐵푖 and 푠푢푝푝(퐵푖) ⊂ Γ푃 . Since 푇 is injective, we get that 푇 ′(푤) ̸∈ 푇 ′(풞′). Consider the code 푇 ′(풞) = ⟨푇 ′(푤)⟩ ⊕ 푇 ′(풞′). 55

Therefore, there exists a level 푖 of 푃 such that the projection of 푇 ′(푤) in this level does ′ (푗) not belong to 퐵푖, i.e, 푇 (푤) ̸∈ 퐵푖. Let 푖0 be the maximal level with this property. 푛 (푖) ′ (푖) (푖) Define the vector 푣 ∈ F푞 as follows: 푣 = 푇 (푤) if 푖 ≤ 푖0 and 푣 = 0 if 푖 > 푖0. Since 푣 = 푇 ′(푤)−푐 for some 푐 ∈ 푇 ′(풞′) we get that 푇 ′(풞) = ⟨푣⟩⊕푇 ′(풞′). By Lemma 3.1.1, there is a linear isometry 푇 such that 푇 (퐵푖) = 퐵푖 if 푖 ≠ 푖0 (ensured by the Observation 3.1.2)

푖0 and 푠푢푝푝(푇 (퐵푖0 ⊕ ⟨푣⟩)) = 푠푢푝푝(푇 (퐵푖0 ) ⊕ ⟨푣̃︀⟩) ⊂ Γ푃 where 푇 (푣) = 푣̃︀. Therefore, denoting ′ 푠 풞푖 = 퐵푖 for every 푖 ≠ 푖0 and 풞푖0 = 푇 (퐵푖0 ) ⊕ ⟨푣̃︀⟩, we have that 풞̃︀ = 푇 푇 (풞) = ⊕푖=1풞푖 and 푖 푠푢푝푝(풞푖) ⊂ Γ푃 for every 푖 ∈ {1, . . . , 푠}.

Given a linear code 풞, Theorem 3.1.3 states that there exists a code 풞̃︀, equiva- 푖 lent to 풞, such that 풞̃︀ = 풞1 ⊕...⊕풞푠 and 푠푢푝푝(풞푖) ⊂ Γ푃 for every 푖 ∈ {1, . . . , 푠}. Assuming 푃 to be naturally labeled, the direct sum can be seen as a product of codes, namely,

풞̃︀ = 풞1 ⊕ ... ⊕ 풞푠 = (풞̂︁1,..., 풞̂︁푠),

where the codes 풞̂︀푖 were obtained from 풞푖 just by deleting the coordinates that do not belong to the level 푖 of the poset (this is known as the punctured code construction, see

푛 푛푖 푖 [27]). Therefore, while 풞푖 is a sub-code of F푞 , 풞̂︀푖 is a sub-code of F푞 where 푛푖 = |Γ푃 |.

Corollary 3.1.4. If 풞 is a 푃 -code where 푃 is a naturally labeled (푛 : 푛1, . . . , 푛푠) hierar- chical poset, there is a code 풞′, equivalent to 풞, with generator matrix 퐺 in the form

⎡ ⎤ 0 0 0 ··· 퐺 ⎢ 푠 ⎥ ⎢ ⎥ ⎢ . . ⎥ ⎢ 0 0 . 0 ⎥ ⎢ ⎥ ⎢ . . . ⎥ 퐺 = ⎢ . . . ⎥ , ⎢ . . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0 퐺2 0 ··· 0 ⎥ ⎢ ⎥ ⎣ ⎦ 퐺1 0 0 ··· 0

where 퐺푖 is the generator matrix of 풞̂︀푖.

The representation given by Corollary 3.1.4 can be standardized even more.

Indeed, by operating in the rows of 퐺 we can assume that 퐺푖 is in reduced row echelon form, note that this operations only perform changing of basis in the code. If 푇 ∈ 풜푢푡(푃 ), according to Example 2.3.7, 푇 is a composition of permutations according to each level of the poset. In terms of matrices, assuming that the generator matrix 퐺 of a code 풞 is 56

in the form given by Corollary 3.1.4 and that each 퐺푖 is in reduced row echelon form, the description of 풜푢푡(푃 ) ensures that any two rows of 퐺 can be permuted if, and only if, the coordinates corresponding to these rows belong to the same level of the poset. Therefore, if 푃푖 are 푛푖 × 푛푖 permutation matrices, the code generated by

⎡ ⎤ 0 0 0 ··· 퐺 푃 ⎢ 푠 푠 ⎥ ⎢ ⎥ ⎢ . . ⎥ ⎢ 0 0 . 0 ⎥ ⎢ ⎥ ′ ⎢ . . . ⎥ 퐺 = ⎢ . . . ⎥ ⎢ . . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0 퐺2푃2 0 ··· 0 ⎥ ⎢ ⎥ ⎣ ⎦ 퐺1푃1 0 0 ··· 0 is equivalent to 풞. With the previous arguments, the following representation holds.

Corollary 3.1.5. If 풞 is a 푃 -code where 푃 is a naturally labeled (푛 : 푛1, . . . , 푛푠) hierar- chical poset, there is a code 풞′, equivalent to 풞 with generator matrix 퐺 given by

⎡ ⎤ 0 0 0 ··· [퐼푘 |퐴푠] ⎢ 푠 ⎥ ⎢ ⎥ ⎢ . . ⎥ ⎢ 0 0 . 0 ⎥ ⎢ ⎥ ⎢ . . . ⎥ 퐺 = ⎢ . . . ⎥ ⎢ . . . ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0 [퐼푘 |퐴2] 0 ··· 0 ⎥ ⎢ 2 ⎥ ⎣ ⎦ [퐼푘1 |퐴1] 0 0 ··· 0

where 푘푖 is the dimension of 풞̂︀푖 and 퐴푖 are 푘푖 × (푛푖 − 푘푖) matrices. This representation is called the canonical-systematic form.

As may be seen in [15], the constants 푘1, . . . , 푘푠 are determined by the weight enumerator of the code and, in this sense, the decomposition is canonical.

3.2 Partitions and Decompositions

A partition of a subset 퐽 ⊆ [푛] is a family of subsets {퐽1, . . . , 퐽푟} such that

푟 ⋃︁ 퐽 = 퐽푖, 푖=1

푟 where 퐽푖 ∩퐽푗 = ∅ if 푖 ≠ 푗 and each 퐽푖 ≠ ∅. We will denote such a partition by 풥 = (퐽푖)푖=1. * 푟 If we write 퐽0 = [푛] ∖퐽 = {푖 ∈ [푛]: 푖∈ / 퐽}, the triple 풥 = (퐽; 퐽0; 퐽푖)푖=1 is called pointed 57

partition, since 퐽 is the union of the subsets 퐽푖, it may be omitted in the notation, therefore * 푟 the pointed partition 풥 will also be denoted by (퐽0; 퐽푖)푖=1. Note that 퐽0 = ∅ if, and only if, 퐽 = [푛]. We stress that 퐽0 has a special role, since it is the only part we allow to be empty. From now on, we consider only pointed partitions, so we will omit the symbol * and the adjective “pointed”. A partition 풥 can be refined in two ways, either by increasing the number of parts or by enlarging the distinguished part 퐽0. Except for the pointer 퐽0, the order of the other parts is irrelevant, for example,

(퐽0; {1, 2} , {3, 4, 5}) = (퐽0; {5, 4, 3} , {1, 2}) .

푟 ′ ′ 푟+1 Definition 3.2.1. An 푙-split of a partition 풥 = (퐽0; 퐽푖)푖=1 is a partition 풥 = (퐽0; 퐽푖)푖=1 ′ ′ ′ ′ ′ where 퐽푖 = 퐽푖 for each 푖 ≠ 푙 and 퐽푙 = 퐽푙 ∪ 퐽푟+1, with both 퐽푙 and 퐽푟+1 non-empty. This means that 퐽푙 is split into two components and the others are unchanged. An 푙-aggregate 푟 ′ ′ ′ 푟 ′ of a partition 풥 = (퐽0; 퐽푖)푖=1 is a partition 풥 = (퐽0; 퐽푖)푖=1 where 퐽푖 = 퐽푖 if 푖 ̸∈ {푙, 0}, ′ * ′ * * 퐽푙 = 퐽푙 ∪ 퐽푙 and 퐽0 = 퐽0 ∪ 퐽푙 for some ∅= ̸ 퐽푙 퐽푙, i.e., some elements of 퐽푙 were aggregated into the distinguished part 퐽0.

Definition 3.2.2. We say that a partition 풥 ′ is a 1-step refinement of 풥 if 풥 ′ is obtained from 풥 by performing a single 푙-split or a single 푙-aggregate operation, for some 푙 < |풥 |. The partition 풥 ′ is a refinement of 풥 if 풥 ′ can be obtained from 풥 by a successive number of 1-step refinements. We use the notation 풥 ≥ 풥 ′ to denote a refinement and ′ ′ 풥 ≥푙 풥 to denote that 풥 is a 1-step refinement of 풥 performed by an 푙-split or 푙- aggregate. When the kind of operation (splitting or aggregation) is relevant, we will use 푠 ′ 푎 ′ the notation 풥 ≥푙 풥 and 풥 ≥푙 풥 for an 푙-split or an 푙-aggregate, respectively.

Example 3.2.3. The partition ([4]; ∅; {1, 2, 3, 4}) can be refined in order to get the par- tition ([4]; {1, 3}; {2}, {4}) by using the following 1-step refinements:

푎 (∅; {1, 2, 3, 4}) ≥1 ({3} ; {1, 2, 4})

푎 푠 ≥1 ({1, 3} ; {2, 4}) ≥1 ({1, 3} ; {2} , {4}) .

By using the set partition previously defined, decompositions of linear codes can be constructed. Each set partition over [푛] induces a decomposition of linear odes, as explained below. Such decompositions are the algebraic equivalent of the set partitions 58 over [푛].

푟 Definition 3.2.4. We say that C = (풞; 풞0; 풞푖)푖=1 is a decomposition of an [푛, 푘, 훿]푞 code 푛 풞 if each 풞푖 is a subspace of F푞 such that:

푟 (a) 풞 = ⊕푖=1풞푖 with 푑푖푚(풞푖) > 0 for every 푖 ∈ {1, . . . , 푟};

(b) 풞0 = {(푥 1, 푥2, . . . , 푥푛): 푥푖 = 0 if 푖 ∈ 푠푢푝푝 (풞)};

푟 (c) (푠푢푝푝 (풞0); 푠푢푝푝 (풞푖))푖=1 is a pointed partition over [푛].

Definition 3.2.5. An 푙-split, 푙-aggregate, 1-step refinement and a refinement C ′ = ′ ′ 푟′ 푟 (풞; 풞0; 풞푖)푖=1 of a decomposition C = (풞; 풞0; 풞푖)푖=1 are defined according to

′ ′ 푟′ (푠푢푝푝 (풞0); 푠푢푝푝 (풞푖))푖=1

푟 being an 푙-split, 푙-aggregate, 1-step refinement or a refinement of (푠푢푝푝 (풞0); 푠푢푝푝 (풞푖))푖=1.

Example 3.2.6. Let 푃 be a poset over [4]. Consider the [4, 2, 훿푃 ]2 code 풞 given by

풞 = {0000, 1100, 0010, 1110}.

Then, 푠 2 (풞; ⟨푒4⟩; 풞) ≥1 (풞; ⟨푒4⟩; 풞푖)푖=1 where

풞1 = {0000, 1100} and 풞2 = {0000, 0010}.

푟 Definition 3.2.7. A decomposition C = (풞; 풞0; 풞푖)푖=1 is said to be maximal if it does not admit a refinement.

푛 Let 훽 = {푒1, . . . , 푒푛} be the canonical basis of F푞 . Given 퐼 ⊂ [푛], the 퐼- coordinate subspace 푉퐼 is defined by

{︃ }︃ ∑︁ 푉퐼 = ⟨{푒푖 : 푖 ∈ 퐼}⟩ = 푥푖푒푖 : 푥푖 ∈ F푞 . 푖∈퐼

푟 Given a decomposition C = (풞; 풞0; 풞푖)푖=1 of a linear code 풞, we say that

푉푖 := 푉푠푢푝푝(풞푖) = ⟨{푒푖 : 푖 ∈ 푠푢푝푝 (풞푖)}⟩ 59

풞 is the support-space of (the component) 풞푖. We consider [푛]풞 = 푠푢푝푝 (풞) and [푛] = 풞 [푛] ∖[푛]풞. In the case where [푛] ≠ ∅, we write 푉0 = 푉[푛]풞 and denote 풞0 = 푉0. We 푟 say that the decomposition (풞; 풞0; 풞푖)푖=1 is supported by the environment decomposition 푟 풞 (푉0; 푉푖)푖=1. In the case where [푛] = ∅, we have 푉0 = 풞0 = {0}.

Until now, the metric 푑푃 has not played role in the decomposition of a code. We introduce now a decomposition that depends of the metric and consequently of the poset 푃 .

′ ′ ′ 푟 ′ Definition 3.2.8. A 푃 -decomposition of 풞 is a decomposition C = (풞 ; 풞0; 풞푖)푖=1 of 풞 (as ′ ′ in Definition 3.2.4) where 풞 ∼푃 풞. Each 풞푖 is called a component of the decomposition. A trivial 푃 -decomposition of 풞 is either the decomposition (풞; 풞0; 풞) or any 푃 -decomposition ′ ′ ′ ′ ′ with a unique factor (풞 ; 풞0; 풞 ) where |푠푢푝푝 (풞 )| = |푠푢푝푝 (풞)| and |푠푢푝푝 (풞0)| = |푠푢푝푝 (풞0)|.

Definition 3.2.9. A code 풞 is said to be 푃 -irreducible if it does not admit a non-trivial 푃 -decomposition.

Example 3.2.10. Consider the [4, 2, 훿푃 ]푞 code 풞 given by

풞 = {0000, 1110, 0111, 1001}.

Let 푃1 be the (4 : 2, 2) hierarchical poset, 푃2 be the anti-chain poset and 푃3 be the poset determined by 2 ≤푃3 3 and 2 ≤푃3 4. Their Hasse diagrams are as follows:

3 4 3 4

1 2 3 4 1 2 1 2

P1 P2 P3

It is straightforward that 풞 does not admit a non-trivial 푃2-decomposition, then 풞 is

푃2-irreducible. If we consider the 푑푃3 -isometry given by

(푥1, 푥2, 푥3, 푥4) ↦→ (푥1, 푥2 + 푥3, 푥3, 푥4), then ′ C = ({0000, 1010, 0011, 1001}; ⟨푒2⟩; {0000, 1010, 0011, 1001}). 60

is a 푃3-decomposition of 풞. Note that the previous isometry is also a 푑푃1 -isometry, hence ′ C is also a 푃1-decomposition of 풞. Considering the 푑푃1 -isometry given by

(푥1, 푥2, 푥3, 푥4) ↦→ (푥1 + 푥3 + 푥4, 푥2, 푥3, 푥4), we get that

′′ C = ({0000, 0010, 0011, 0001}; ⟨푒1, 푒2⟩; {0000, 0010} ⊕ {0000, 0001})

is also a 푃1-decomposition for 풞.

′ ′ ′ 푟 푟 Given a 푃 -decomposition C = (풞 ; 풞0; 풞푖)푖=1 of 풞, we have that [푛] = ∪푖=0푠푢푝푝 (푉푖). ′ ′ Also, if 풞푖 = {0}, then 푖 = 0. If each 풞푖 is 푃 -irreducible, denoting 푛푖 = 푑푖푚(푉푖) and ′ 푘푖 = 푑푖푚(풞푖), then

푟 푟 ∑︁ 푛 ∑︁ 푛푖 = 푛 = 푑푖푚(F푞 ) and 푘푖 = 푘 = 푑푖푚(풞). 푖=0 푖=1

′ ′ ′ ′ 푟′ ′′ ′′ ′′ ′′ 푟′′ Consider two 푃 -decompositions C = (풞 ; 풞0; 풞푖)푖=1 and C = (풞 ; 풞0 ; 풞푖 )푖=1 of a code 풞. Associated to those 푃 -decompositions there are two partitions of [푛], ′ ′ 푟′ ′′ ′′ 푟′′ namely: (푠푢푝푝 (풞0); 푠푢푝푝 (풞푖))푖=1 and (푠푢푝푝 (풞0 ); 푠푢푝푝 (풞푖 ))푖=1. By the definition of a ′ ′′ 푛 ′ ′ 푃 -decomposition, there are isometries 푇 , 푇 ∈ 퐺퐿푃 (F푞 ) such that 푇 (풞) = 풞 and 푇 ′′ (풞) = 풞′′. Denote 푇 = 푇 ′′ ∘ (푇 ′)−1. Then 푇 is a linear isometry and 푇 (풞′) = 풞′′.

By Lemma 2.3.5, 푇 induces an automorphism of order 휑푇 :[푛] → [푛]. The automorphism ′ 휑푇 induces a map on the partition of [푛] determined by the 푃 -decomposition C , namely,

′ ′ 푟′ ′ ′ 푟′ 휑푇 [(푠푢푝푝 (풞0); 푠푢푝푝 (풞푖))푖=1] = (휑푇 (푠푢푝푝 (풞0)) ; 휑푇 (푠푢푝푝 (풞푖)))푖=1 .

Using the previous notations, we can define the analogous of the operations over decompositions to 푃 -decompositions.

푛 ′ Definition 3.2.11. Let 풞 ⊂ F푞 be a linear code and 푃 be a poset over [푛]. Let C = ′ ′ ′ 푟′ ′′ ′′ ′′ ′′ 푟′′ ′ (풞 ; 풞0; 풞푖)푖=1 and C = (풞 ; 풞0 ; 풞푖 )푖=1 be two 푃 -decompositions of 풞. We say that C is a ′′ ′ ′ 푟′ 푃 -refinement (1-step 푃 -refinement) of C if 휑푇 [(푠푢푝푝 (풞0); 푠푢푝푝 (풞푖))푖=1] is a refinement ′′ ′′ 푟′′ (1-step refinement) of the partition (푠푢푝푝 (풞0 ); 푠푢푝푝 (풞푖 ))푖=1.

Similar to what was done with set partitions in Definition 3.2.2, the symbols 61

푃 푃 푠 푎 ≥푙 and ≥푙 specify when the refinement was obtained by an 푙-split or an 푙-aggregation, respectively.

4 Example 3.2.12. Let 풞 = {0000, 1100, 0011, 1111} be a 2-dimensional code over F2 and 푃 be the chain poset determined by 1 6 2 6 3 6 4. Then, starting with the trivial 푃 -decomposition, we have the following refinements:

푃 푎 (풞; ∅; 풞) ≥1 ({0000, 1100, 0001, 1101} ; {0000, 0010} ; {0000, 1100, 0001, 1101}) 푃 푎 ≥1 ({0000, 0100, 0001, 0101} ; {0000, 0010, 1000, 1010} ; {0000, 0100, 0001, 0101}) 푃 푠 ≥1 ({0000, 0100, 0001, 0101} ; {0000, 0010, 1000, 1010} ; {0000, 0100} , {0000, 0001}) where the first two refinements were obtained by considering the linear 푃 -isometries

(푥1, 푥2, 푥3, 푥4) ↦→ (푥1, 푥2, 푥3 − 푥4, 푥4) and

(푥1, 푥2, 푥3, 푥4) ↦→ (푥1 − 푥2, 푥2, 푥3, 푥4) , respectively. The third refinement is just a splitting

{0000, 0100, 0001, 0101} = {0000, 0100} ⊕ {0000, 0001} .

푟 Definition 3.2.13. A 푃 -decomposition C = (풞; 풞0; 풞푖)푖=1 is said to be maximal if each

풞푖 is 푃 -irreducible for every 푖 ∈ {1, . . . , 푟}.

푛 Let us consider the Hamming metric 푑퐻 over F2 . It is well-known that the group of linear isometries of this metric space is isomorphic to the permutation group 풮푛. 푟 ′ Given a code 풞, let C = (풞; 풞0; 풞푖)푖=1 be a maximal decomposition of 풞. If a code 풞 is 푛 ′ 푑퐻 -equivalent to 풞, then there is 푇 ∈ 퐺퐿푑퐻 (F푞 ) ∼ 풮푛 such that 푇 (풞) = 풞 . Note that the ′ ′ 푟 ′ decomposition C = (풞 ; 푇 (풞0); 푇 (풞푖))푖=1 is a maximal decomposition of 풞 , otherwise, C would not be a maximal decomposition of 풞. Therefore, when considering the Hamming metric, a maximal decomposition is also a maximal 퐻-decomposition (퐻 is the anti-chain poset over [푛]). This is not the general case, as we can see in the following example.

Example 3.2.14. Consider 풞 to be the 1-dimensional binary code of length 푛 generated 푛 by (1, 1,..., 1) ∈ F2 . Let 푃 be a poset defined by the chain order: 1 6 2 6 ··· 6 푛; and 62

퐻 be the anti-chain order poset. It follows that 풞 is 퐻-irreducible but not 푃 -irreducible. Indeed, by Lemma 2.3.6, the map

푇 (푥1, . . . , 푥푛−1, 푥푛) = (푥1 + 푥푛, . . . , 푥푛−1 + 푥푛, 푥푛)

∑︀푛 is a 푃 -isometry because 푇 (푒푖) = 푒푖 for every 푖 ≠ 푛 and 푇 (푒푛) = 푖=1 푒푖. Also, 푇 (풞) = ⟨푒푛⟩ is the code generated by the vector 푒푛, hence

′ C = (⟨푒푛⟩; ⟨{푒1, . . . , 푒푛−1}⟩; ⟨푒푛⟩) is a maximal 푃 -decomposition of 풞. Therefore, 풞 is not 푃 -irreducible. Since 풞 is a one-dimensional code, we only can perform aggregations, but since 푠푢푝푝(11 ... 1) = [푛], aggregations change the Hamming weight, so that 풞 is 퐻-irreducible.

′ 푟 Definition 3.2.15. Let C = (풞 ; 풞0; 풞푖)푖=1 be a 푃 -decomposition of an [푛, 푘, 훿]푞 code 풞. The profile of C is the array

profileC ( ) := [(푛0, 푘0) , (푛1, 푘1) ,..., (푛푟, 푘푟)] , where

푛푖 = |푠푢푝푝 (풞푖)| and 푘푖 = 푑푖푚 (풞푖) .

Observe that |푠푢푝푝(풞푖)|= 푑푖푚(푉푖), then 푛 = 푛0 + 푛1 + ··· + 푛푟 and 푘 = 푘1 +

푘2 + ··· + 푘푟. The following theorem states that the profile of a maximal 푃 -decomposition C of a code 풞 depends (essentially) exclusively on 풞, not on C .

′ Theorem 3.2.16. Let 풞 be an [푛, 푘, 훿]푞 code and let 푃 be a poset over [푛]. Let C and C ′′ be two maximal 푃 -decompositions of 풞 with

′ ′ ′ ′ ′ ′ ′ profileC ( ) = [(푛0, 푘 0) , (푛1, 푘1) ,..., (푛푟, 푘푟)] and ′′ ′′ ′′ ′′ ′′ ′′ ′′ profileC ( ) = [(푛0, 푘0 ) , (푛1, 푘1 ) ,..., (푛푠 , 푘푠 )] .

′ ′′ Then, 푟 = 푠 and, up to a permutation, profileC ( ) = profile (C ), i.e., there is 휎 ∈ 풮푟 ′ ′ ′′ ′′ ′ ′ ′′ ′′ such that (푛푖, 푘푖) = (푛휎(푖), 푘휎(푖)) and (푛0, 푘 0) = (푛0, 푘0 ). 63

′ ′ ′ ′ 푟 ′′ ′′ ′′ ′′ 푠 Proof. Let C = (풞 ; 풞0; 풞푖)푖=1 and C = (풞 ; 풞0 ; 풞푖 )푖=1 be two maximal 푃 -decompositions 푛 of 풞. Suppose, without loss of generality, 푟 < 푠. If 푇 ∈ 퐺퐿푃 (F푞 ) is an isometry satisfying ′ ′′ ′ ′ ′ 푇 (풞 ) = 풞 , there is a component 풞푖 of C such that 푇 (풞푖) is not contained in any ′′ ′′ ′ ′ component 풞푗 of C , otherwise 푟 ≥ 푠. Hence, there are components 풞푖0 of C and ′′ ′′ ′′ ′′ 풞푗0 , 풞푗1 ,..., 풞푗푡 of C such that

(︁ ′ )︁ ′′ ′′ ′′ 푇 풞푖0 ⊂ 풞푗0 ⊕ 풞푗1 ⊕ · · · ⊕ 풞푗푡

(︁ )︁ 푇 풞′ ∩ 풞′′ ̸ ∅ 푙 ∈ { , . . . , 푡} and 푖0 푗푙 = for any 1 . Therefore,

푡 ′′ ︁( ′ ︁) ؁ ︁( ′ ︁) 푇 풞푖0 = 푇 풞푖0 ∩ 풞푗푚 푚=0

′ is a non-trivial 푃 -decomposition for 풞푖0 , contradicting the fact that each component of a maximal 푃 -decomposition is 푃 -irreducible. It follows that 푟 = 푠. Moreover, for every 푖 ∈ { , . . . , 푟} 푗 푇 풞′ ⊆ 풞′′ 푛′ ≤ 푛′′ 푘′ ≤ 푘′′ 1 there is 푖 such that ( 푖) 푗푖 . Hence, 푖 푗푖 and 푖 푗푖 . Applying 푇 −1 ∈ 퐺퐿 푛 푛′′ ≤ 푛′ 푘′′ ≤ 푘′ 푛′ 푛′′ the same reasoning to 푃 (F푞 ), we get that 푖 푗푖 and 푖 푗푖 , hence 푖 = 푗푖 푘′ 푘′′ ′ ′′ and 푖 = 푗푖 , so that, up to a permutation, profileC ( ) = profile (C ).

The next Corollary follows straight from Theorem 3.2.16.

′ ′ ′ ′ 푟 ′′ ′′ ′′ ′′ 푟 Corollary 3.2.17. Let C = (풞 ; 풞0; 풞푖)푖=1 and C = (풞 ; 풞0 ; 풞푖 )푖=1 be two maximal 푃 - 푛 ′ ′′ decompositions of 풞 and let 푇 ∈ 퐺퐿푃 (F푞 ) be a linear isometry such that 푇 (풞 ) = 풞 . ′ ′′ Then, there is a permutation 휎 ∈ 풮푟 such that 푇 (풞푖) = 풞휎(푖).

To express the amount of operations (splitting and aggregations) performed in a decomposition, we will define the complexity of a decomposition (do not confuse with the computational complexity of problems).

′ 푟 Definition 3.2.18. Given a 푃 -decomposition C = (풞 ; 풞0; 풞푖)푖=1 of 풞, its complexity is defined by 푟 1 ∑︁ 풪푃 (C ) = + 푛푖 − 푘푖, 푟 푖=1 where 푛푖 = 푑푖푚(푉푖) (the dimension of the support-space of 풞푖) and 푘푖 = 푑푖푚(풞푖) for every 1 ≤ 푖 ≤ 푟.A 푃 -decomposition with minimum complexity is called primary 푃 - decomposition. 64

Note that the complexity of a 푃 -decomposition is completely determined by its profile. Each aggregation decreases the complexity since some of the parcels 푛푖 − 푘푖 in the summation decrease. A splitting also decreases the complexity since it increases the number of components (the number 푟). By Theorem 3.2.16, maximal 푃 -decompositions have the same complexity. Thus, the minimum complexity of a decomposition of a code 풞 will be denoted by 풪(풞) instead of 풪(C ). It is straightforward that aggregations and splittings decrease the complexity of a 푃 -decomposition; actually, if C ′ is a refinement of ′ C , then 풪푃 (C ) < 풪푃 (C ). Therefore, we have the following proposition.

Proposition 3.2.19. A 푃 -decomposition of a code 풞 is maximal if, and only if, it is a primary 푃 -decomposition.

푛 The permutation part 풜푢푡(푃 ) of 퐺퐿푃 (F푞 ) does not alter the complexity of a decomposition, hence it is irrelevant regarding maximality of 푃 -decompositions.

′ 푟 Lemma 3.2.20. Let C = (풞 ; 풞0; 풞푖)푖=1 be a maximal 푃 -decomposition of 풞. Let 휑 ∈

퐴푢푡 (푃 ) and 푇휑 ∈ 풜푢푡(푃 ) be the isometry induced by 휑. Then,

′ ′ 푟 C = (푇휑 (풞 ); 푇휑 (풞0); 푇휑 (풞푖))푖=1 is also a maximal 푃 -decomposition of 풞.

Proof. Because 휑 is a permutation, for every 푖 ∈ {0, 1, . . . , 푟},

푗 ∈ 푠푢푝푝(풞푖) ⇐⇒ 휑(푗) ∈ 푠푢푝푝(푇휑(풞푖)).

Hence, C ′ is a 푃 -decomposition of 풞. Since its profile coincides with the profile of C , C ′ is also a maximal 푃 -decomposition of 풞.

We recall that the set 풫푛 of all posets over [푛] is itself a partially ordered set. Maximal 푃 -decompositions “behaves well” according to this order in the following way:

Theorem 3.2.21. Let 푃, 푄 ∈ 풫푛 with 푃 ≤ 푄. Given a code 풞, there is a maximal 푃 -decomposition of 풞 which is also a 푄-decomposition of 풞.

′ ′ 푟 Proof. Assume 푃, 푄 ∈ 풫푛 and 푃 < 푄. Let C = (풞 ; 풞0; 풞푖)푖=1 be a maximal 푃 - 푛 ′ decomposition of 풞 and 푇 ∈ 퐺퐿푃 (F푞 ) such that 푇 (풞) = 풞 . By the characterization 65

푛 of 퐺퐿푃 (F푞 ), 푇 = 퐴 ∘ 푇휑 where 퐴 ∈ 풢푃 and 푇휑 ∈ 풜푢푡(푃 ). From Lemma 3.2.20, we have that ′′ ′ 푟 C = (푇휑−1 (풞 ); 푇휑−1 (풞0); 푇휑−1 (풞푖))푖=1

′ is also a maximal 푃 -decomposition of 풞. However 푇휑−1 (풞 ) = 푇휑−1 ∘ 푇 (풞), hence

′ 푇휑−1 (풞 ) = 푇휑−1 ∘ 퐴 ∘ 푇휑(풞).

We stress that 풢푃 ⊂ 풢푄 always that 푃 ≤ 푄. Because 풢푃 is a normal subgroup of 푛 ′′ 퐺퐿푃 (F푞 ), it follows that 푇휑−1 ∘ 퐴 ∘ 푇휑 ∈ 풢푃 , hence 푇휑−1 ∘ 퐴 ∘ 푇휑 ∈ 풢푄. Therefore, C is a 푄-decomposition of 풞.

As a direct consequence of the previous theorem, we obtain a relation among primary decompositions of posets and the natural order over 풫푛.

Corollary 3.2.22. Let 푃, 푄 ∈ 풫푛 with 푃 ≤ 푄. Then, 풪푄(풞) ≤ 풪푃 (풞) for every linear code 풞.

Concerning primary 푃 -decompositions, there is always a code that it is differ- ently decomposed depending on the poset, indeed:

Proposition 3.2.23. Let 푃, 푄 ∈ 풫푛 with 푃 < 푄. Then, there is a code 풞 such that

풪푄(풞) < 풪푃 (풞).

Proof. Let us suppose 푃 < 푃1 ≤ 푄 and that 푃1 covers 푃 . Then, there are 푖0, 푗0 ∈ [푛] such that

푖0 푃 푗0 and 푖0 6푃1 푗0.

′ Let 풞 be the one-dimensional code generated by 푒푖0 +푒푗0 and let 풞 be the one-dimensional code generated by 푒푗0 . The linear map 푇 defined by 푇 (푒푗) = 푒푗 for every 푗 ∈ [푛] ∖ {푖0, 푗0},

푇 (푒푖0 ) = 푒푖0 and 푇 (푒푗0 ) = 푒푗0 − 푒푖0 , is a 푃1-linear isometry by the characterization of 푛 ′ ′ 퐺퐿푃 (F푞 ) given in Lemma 2.3.6. Therefore, the 푃1-decomposition C = (풞 ; 풞0; ⟨푒푗0 ⟩) of 풞 ′ 푛 is a primary 푃1-decomposition. We note that 풞0 = F푞 ∖ ⟨푒푗0 ⟩ and therefore, 풪푃1 (풞) = 1. ′ On the other hand, we claim that C = (풞; 풞0; 풞), where 풞0 = ⊕푖̸=푖0,푗0 푉푖, is a 푃 -primary decomposition of 풞. Indeed, suppose, for a contradiction that C ′ is not 66 a 푃 -primary decomposition, the only way to refine it would be by constructing a 1- ′′ ′′ ′′ ′′ dimensional code 풞 with |푠푢푝푝 (풞 )| = 1 and 풞 ∼푃 풞. Hence, 풞 would be generated by 푛 a canonical vector 푒푑 for some 푑 ∈ [푛]. However, each 푇 ∈ 퐺퐿푃 (F푞 ) determines an order 푛 ′′ automorphism 휑푇 ∈ 퐴푢푡 (푃 ) as in Lemma 2.3.5. Suppose 푇 ∈ 퐺퐿푃 (F푞 ) and 푇 (풞) = 풞 , −1 therefore 푇 (푒푖0 + 푒푗0 ) = 푒푑 for some 푑 ∈ [푛]. Considering 푆 = 푇 , it follows that

푀푎푥(⟨푠푢푝푝(푆(푒푑))⟩푃 ) = {푖0, 푗0}, which is a contradiction since 푆 does not determine an order automorphism as in Lemma 2.3.5. Thus, such isometry does not exist. Therefore, C ′ is a primary 푃 -decomposition for 풞 and

1 = 풪푃1 (풞) < 풪푃 (풞) = 2. (3.1)

Since 푃1 ≤ 푄, Corollary 3.2.22 ensures that

풪푄(풞) ≤ 풪푃1 (풞). (3.2)

Inequalities 3.1 and 3.2 imply 풪푄(풞) < 풪푃 (풞).

We remark that Corollary 3.2.22 together with Proposition 3.2.23 implies that primary decomposition is a characterization of posets, in the sense that a given poset 푃 may be reconstructed from the profile of codes according to 푃 . Moreover, looking at the proof of Proposition 3.2.23, one may notice that the reconstruction can be done by considering only the 푛(푛 − 1)/2 pairs of vectors (푒푖, 푒푗).

3.2.1 Hierarchical Bounds

Hierarchical poset metrics are well understood. In particular, if 푃 is a hier- archical poset, the profile of a primary 푃 -decomposition of a code 풞 is uniquely (and easily) determined by the weight hierarchy of the code. Moreover, as we can see in [30], this property is exclusive of hierarchical posets. For this reason, when considering a general poset 푃 , we aim to establish bounds for 풪푃 (풞) considering the easy-to-compute primary 푃 -decompositions relatively to hierarchical posets. In the next section, we will characterize the generator matrices that determine maximal 푃 -decompositions providing an algorithm to obtain these matrices. But the complexity to determine these matrices will not be discussed. 67

푖 푗 We say that a poset 푃 is hierarchical at level 푖 if levels Γ푃 and Γ푃 relate 푗 hierarchically for every 푗 ∈ {1, . . . , 푖 − 1}, i.e., if 푎 ∈ Γ푃 for some 푗 ∈ {1, . . . , 푖 − 1} and 푖 푏 ∈ Γ푃 then 푎 6 푏. Let

ℋ (푃 ) = {푖 ∈ [푟]: 푃 is hierarchical at 푖} .

It is clear that 푃 is hierarchical if, and only if, ℋ (푃 ) = [푟]. We stress that for every integers 푎 and 푏 with 푎 < 푏, the notation [푎, 푏] denotes the set {푎, 푎 + 1, . . . , 푏}. Given a poset 푃 with height 푟, then: (1) Consider

푠1 = min{푖 ∈ [푟]: 푃 is not hierarchical at level 푖 + 1} and

푠2 = min{푖 ∈ [푟]: 푖 > 푠1 and 푃 is hierarchical at level 푖 + 1}.

Denote

퐽푖 = [푛1 + ··· + 푛푖−1 + 1, 푛1 + ··· + 푛푖] for every 1 ≤ 푖 < 푠1 and

퐽푠1 = [푛1 + ··· + 푛푠1−1 + 1, 푛1 + ··· + 푛푠2 ].

(2) Consider

푠3 = min{푖 ∈ [푟]: 푖 > 푠2 and 푃 is not hierarchical at level 푖 + 1} and

푠4 = min{푖 ∈ [푟]: 푖 > 푠3 and 푃 is hierarchical at level 푖 + 1}.

Thus, denote

퐽푠1+푖 = [푛1 + ··· + 푛푠2+푖−1 + 1, 푛1 + ··· + 푛푠2+푖] for every 푖 ≥ 1 satisfying 푠2 + 푖 < 푠3 and

퐽푠1+(푠3−푠2) = [푛1 + ··· + 푛푠3−1 + 1, 푛1 + ··· + 푛푠4 ]. 68

(3) Consider

푠5 = min{푖 ∈ [푟]: 푖 > 푠4 and 푃 is not hierarchical at level 푖 + 1} and

푠6 = min{푖 ∈ [푟]: 푖 > 푠5 and 푃 is hierarchical at level 푖 + 1}.

Therefore,

퐽푠1+(푠3−푠2)+푖 = [푛1 + ··· + 푛푠4+푖−1 + 1, 푛1 + ··· + 푛푠4+푖] for every 푖 ≥ 1 satisfying 푠4 + 푖 < 푠5 and

퐽푠1+(푠3−푠2)+(푠5−푠4) = [푛1 + ··· + 푛푠5−1 + 1, 푛1 + ··· + 푛푠6 ].

Proceeding in this way, we construct a partition of [푛] given by 퐽1 ∪ ... ∪ 퐽ℎ for some ℎ. An arbitrary iteration of the process can be described as follows. (General Case) Consider 푡 to be an odd integer, then

푠푡 = min{푖 ∈ [푟]: 푖 > 푠푡−1 and 푃 is not hierarchical at level 푖 + 1} and,

푠푡+1 = min{푖 ∈ [푟]: 푖 > 푠푡 and 푃 is hierarchical at level 푖 + 1}.

Furthermore,

퐽푠1+(푠3−푠2)+···+(푠2푡−1−푠2푡−2) = [푛1 + ··· + 푛푠2푡−1−1 + 1, 푛1 + ··· + 푛2푡] and

퐽푠1+(푠3−푠2)+···+(푠2푡−1−푠2푡−2)+푖 = [푛1 + ··· + 푛푠2푡+푖−1 + 1, 푛1 + ··· + 푛2푡+푖].

Therefore, out of 푃 we can define two hierarchical posets: Upper neighbor: Let 푃 + be the poset over [푛] with the same level decom- 푖 푗 position of 푃 and for every 푎 ∈ Γ푃 and 푏 ∈ Γ푃 with 푎 ≠ 푏, define 푎 6푃 + 푏 if, and only if, 푖 < 푗. Lower neighbor: Let 푃 − be the poset over [푛] with level hierarchy [푛] = 푖 푗 퐽1 ∪ 퐽2 ∪ ... ∪ 퐽ℎ and for 푎 ∈ Γ푃 and 푏 ∈ Γ푃 with 푎 ≠ 푏, we have 푎 6푃 − 푏 if, and only if, 69

푖 < 푗. In order to clarify the definitions of the upper and lower neighbors, we shall present an illustrative example.

Example 3.2.24. Let 푃 be a poset with 4 levels. Suppose ℋ(푃 ) = {1, 2, 4}. Consider the simplified Hasse diagram representing whether the poset is hierarchical at a particular level or not: if two consecutive levels are joined by a dotted line, then the poset is not hierarchical at the level above the other; if two consecutive levels are joined by a line, then the poset is hierarchical at the level above. Therefore, the diagrams of 푃 −, 푃 and 푃 + are as follows:

4 4 ΓP ΓP 2 3 4 ΓP ΓP ΓP 3 3 ΓP ΓP

1 2 2 ΓP ΓP ΓP

P − 1 1 ΓP ΓP P P +

2 2 3 4 We stress that Γ푃 − = Γ푃 ∪ Γ푃 ∪ Γ푃 since the smallest level where 푃 is hierarchical at this level is the third one. Hence, in order to construct 푃 −, all levels of 푃 above the level 2 are gathered in the second level of 푃 −.

It is easy to see that:

(a) 푃 + and 푃 − are hierarchical posets and 푃 is hierarchical if, and only if, 푃 = 푃 + = 푃 −;

− + (b) Considering the natural order ≤ on 풫푛, we have that 푃 ≤ 푃 ≤ 푃 . Moreover,

− 푃 = max {푄 ∈ 풫푛 : 푄 ≤ 푃 and 푄 is hierarchical}

and + 푃 = min {푄 ∈ 풫푛 : 푃 ≤ 푄 and 푄 is hierarchical} .

The next proposition follows directly from Corollary 3.2.22. 70

Proposition 3.2.25. For any linear code 풞,

풪푃 + (풞) ≤ 풪푃 (풞) ≤ 풪푃 − (풞) .

Example 3.2.26. Consider the posets of Example 3.2.10 and denote 푃 = 푃3, then + − 푃 = 푃1 and 푃 = 푃2. Furthermore, using the decompositions in that example (which are maximal), we conclude that

풪푃 + (풞) = 1/2, 풪푃 (풞) = 2 and 풪푃 − (풞) = 3.

If 푃 is not hierarchical, both inequalities are strict for some code 풞. Moreover, the bounds are tight, in the sense that, given a poset 푃 , there are codes 풞1 and 풞2 such that 풪푃 + (풞1) = 풪푃 (풞1) and 풪푃 − (풞2) = 풪푃 (풞2) (just consider any code 풞 with 1 푠푢푝푝(풞) ⊂ Γ푃 ).

3.2.2 Packing Radius Bounds

Maximal 푃 -decompositions may be useful to determine bounds for the packing radius of a code. We remark that the packing radius of codes according to hierarchical metrics was completely characterized in Proposition 2.3.12 and it depends essentially on the minimum distance of the code, see [30]. We also remark that for non-hierarchical metrics, the complexity to determine the packing radius of a single vector is NP-hard (see [12]) and the packing vector is not necessarily a vector with minimum distance.

′ 푟 Proposition 3.2.27. Let C = (풞 ; 풞0; 풞푖)푖=1 be a 푃 -decomposition for 풞. Then,

ℛ푃 (풞) ≤ min ℛ푃 (풞푖). 푖∈{1,...,푟}

′ ′ Proof. Note that ℛ푃 (풞) = ℛ푃 (풞 ). Furthermore, since each 풞푖 is a subcode of 풞 , it follows that ℛ푃 (풞) ≤ ℛ푃 (풞푖) for every 푖 ∈ {1, . . . , 푟}.

Proposition 3.2.28. If 푃 ≤ 푄, then ℛ푃 (풞) ≤ ℛ푄(풞) for every linear code 풞.

Proof. If follows directly from the fact that

퐵푄(푟, 푐) ∩ 퐵푄(푟, 0) ⊂ 퐵푃 (푟, 푐) ∩ 퐵푃 (푟, 0) 71 for every 푐 ∈ 풞 and any integer 푟 ≥ 0.

푛 푛푖 Consider the projections 휋푖 : F푞 −→ F푞 (note that 푛푖 = |푠푢푝푝(풞푖)|) defined by

휋푖(푥1, . . . , 푥푛) = (푥푖1 , . . . , 푥푖푠 )

푛푖 where 푖1 < ··· < 푖푠 and {푖1, . . . , 푖푠} = 푠푢푝푝(풞푖). Consider on F푞 the metric 푑휋푖 induced

푛푖 by 휋푖 in the sense that 휋푖 : 푉푖 → F푞 is an isometry. Then, by the definition of 푑휋푖 ,

ℛ 풞 ℛ 휋 풞 . 푃 ( 푖) = 푑휋푖 ( 푖( 푖)) (3.3)

Using Proposition 3.2.28 and the upper and lower neighbors 푃 + and 푃 − defined in the previous section, bounds for the packing radius of codes according to hierarchical posets may be obtained.

′ 푟 Proposition 3.2.29. Given a maximal 푃 -decomposition C = (풞 ; 풞0; 풞푖)푖=1 for 풞,

ℛ푃 − (풞푖) ≤ ℛ푃 (풞푖) ≤ ℛ푃 + (풞푖) for every 푖 ∈ {1, . . . , 푟}.

Propositions 3.2.27 and 3.2.29 yield the following upper and lower bounds for the packing radius:

ℛ푃 (풞) ≤ min ℛ푃 + (풞푖) (3.4) 푖∈{1,...,푟} for every 푖 ∈ {1, . . . , 푟}. Since 푃 + and 푃 − are hierarchical, these bounds are obtained just by finding the minimum distance of each code 풞푖. If 푃 is hierarchical, the bounds obtained in Proposition 3.2.29 and in Inequality (3.4) are tight.

3.2.3 Construction of Maximal P-Decompositions

In order to find maximal 푃 -decompositions of a code 풞, we need to find codes that are equivalent to 풞 and verify if one of theirs maximal decompositions is a refinement of the given maximal decomposition of 풞. To do so, we will provide a construction ′ that outputs a generator matrix of a code 풞 ∼푃 풞, which determines a maximal 푃 - decomposition of 풞. 72

Given a poset 푃 over [푛] and an [푛, 푘, 훿]푞 code 풞, let 퐺 = (푔푖푗) be a 푘 × 푛 generating matrix of 풞. As we have already noted, we lose no generality by assuming that 퐺 is in a reduced row echelon form, obtained by elementary operations on rows. In order to obtain a maximal 푃 -decomposition, we need to change the classical definition of the reduced row echelon form. For each 푖, let

푗 (푖) = max {푗 : 푔푖푗 ≠ 0} be the right-most non-zero column of the 푖-th row of 퐺. Performing elementary row operations on 퐺, we may assume that

푗(1) > 푗(2) > ··· > 푗(푘) and 푔푖푗(푙) = 0 if 푖 ≠ 푙. (3.5)

We say that 퐺 is in reduced row echelon form if the entries of 퐺 satisfy 3.5. From now on, we assumed that generator matrices have this form. In order to clarify the difference of the proposed reduced row echelon form and the classical one, we present the next example.

Example 3.2.30. Let ⎡ ⎤ 1 0 1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 퐺 = ⎢ 1 1 0 1 1 ⎥ ⎢ ⎥ ⎣ ⎦ 0 1 0 1 1 be a generator matrix for a [5, 3, 1]2 푑퐻 -code 풞. Then,

⎡ ⎤ ⎡ ⎤ 1 0 0 0 0 0 1 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 퐺1 = ⎢ 0 1 0 1 1 ⎥ and 퐺2 = ⎢ 0 0 1 1 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ 0 0 1 1 0 1 0 0 0 0

where 퐺1 was obtained by the classical construction of reduced row echelon form and 퐺2 was obtained by the proposed form. Note that

′ (a) if 푗 (푖) = 푚푖푛{푗 : 푔푖푗 ≠ 0}. Considering 퐺1 we get that

푗′(1) = 1, 푗′(2) = 2 and 푗′(3) = 3, hence 푗′(1) < 푗′(2) < 푗′(3); 73

(b) on the other hand, considering 퐺2,

푗(1) = 5, 푗(2) = 4 and 푗(3) = 1, hence 푗(1) > 푗(2) > 푗(3).

A generator matrix 퐺 determines a unique decomposition of 풞 in the following sense: Construction of C :

Suppose 훽1 = {푣1, . . . , 푣푘} is the set of all rows of 퐺 and 퐼 ⊂ [푛] is the index set of the null columns of 퐺. Then, define

푛 ∑︁ 풞0 = {푣 ∈ F푞 : 푣 = 푥푖푒푖}. 푖∈퐼

Take 푤1 ∈ 훽1 and let 훾1 = {푣푖1 , . . . , 푣푖푟 } ⊂ 훽1 be the set of all rows of 퐺 such that, if

푣 ∈ 훾1, then 푠푢푝푝(푤1) ∩ 푠푢푝푝(푣) ≠ ∅. Denote

푟 ∑︁ 풞1 = {푐 ∈ 풞 : 푐 = 푥푖푣푖푗 }. 푗=1

Take 훽2 = 훽1 ∖훾1, if it is empty, the matrix 퐺 has determined the decomposition (풞; 풞0; 풞1).

If 훽2 ≠ ∅, take 푤2 ∈ 훽2 and 훾2 = {푣푖1 , . . . , 푣푖푠 } ⊂ 훽2 defined by the elements of 훽2 whose support intercepts the support of 푤2 and define

⎧ 푠 ⎫ ⎨ ∑︁ ⎬ 풞2 = 푐 ∈ 풞 : 푐 = 푥푖푣푖푗 . ⎩ 푗=1 ⎭

Proceeding this way, at some point, 훽푟+1 = ∅ and 훽푟 ≠ ∅ for some 푟. Then, C = 푟 (풞; 풞0; 풞푖)푖=1 determines a decomposition of 풞. It is clear that the choice of 푤푖 does not in- terfere in the construction of the decomposition, therefore, the decomposition constructed is unique, up to a permutation of its components.

Example 3.2.31. Considering the generator matrix 퐺 given in Example 3.2.30, it follows that the decomposition determined by 퐺 is the trivial one (풞; ∅; 풞) since every two rows of 퐺 have intersection in their supports. On the other hand, the decomposition obtained by the matrix 퐺2 is not trivial, indeed, the third row has disjoint support from the first 74 and second rows, therefore, if

풞1 = {00000, 01101, 00110, 01011} and

풞2 = {00000, 10000}.

2 The decomposition (풞; ∅; 풞푖)푖=1 is the one determined by 퐺2.

A generator matrix 퐺 is said to be in the generalized reduced row echelon form if there is a permutation in the rows of 퐺 such that the resultant matrix is in a reduced row echelon form.

Proposition 3.2.32. Let 풞 be an [푛, 푘, 훿]푞 code. If 퐺 is a generator matrix for 풞 in a generalized reduced row echelon form, then 퐺 determines a maximal decomposition for 풞.

Proof. Let 퐺 be a generator matrix for 풞 in a generalized reduced row echelon form. Without loss of generality, we will assume that the decomposition determined by 퐺 (as in the construction previously presented) has only one component, i.e., C = (풞; 풞0; 풞). Suppose, for contradiction, there is a refinement for C .

Claim 1: Aggregations are not allowed: if 푖0 ∈ 푠푢푝푝(풞), every basis of 풞 would have an element 푣 such that 푖0 ∈ 푠푢푝푝(푣) but 푠푢푝푝(풞0) is the set of null columns of 퐺. Claim 2: Splittings are not allowed: Suppose there is a splitting for C , in other words, ′ 2 suppose there is a decomposition C = (풞; 풞0; 풞푖)푖=1. Thus, 풞 = 풞1 ⊕ 풞2 with 푠푢푝푝(풞1) ∩

푠푢푝푝(풞2) = ∅. Let 푤1, . . . , 푤푘1 be a basis for 풞1 and 푤푘1+1, . . . , 푤푘 be a basis for 풞2.

The 푘 × 푛 matrix 퐺1 having 푤1, . . . , 푤푘 as its rows, is a generator matrix for 풞 and its ′ decomposition coincides with C . Then, 퐴퐺 = 퐺1 for some 푘 × 푘 matrix 퐴, i.e., each row of 퐺1 is a linear combination of rows of 퐺. Let 훽1 be the minimum set of rows of 퐺 generating 푤1, . . . , 푤푘1 , i.e., for every 1 ≤ 푖 ≤ 푘1,

∑︁ 푖 푤푖 = 푥푣푣 푣∈훽1

푖 푖 where 푥푣 ∈ F푞 and there is no 푖 such that 푥푣 = 0 for every 푣 ∈ 훽1. Let 훽2 be the minimum 75

set of rows of 퐺 generating 푤푘1+1, . . . , 푤푘, i.e.,

∑︁ 푖 푤푖 = 푥푣푣 푣∈훽2

푖 for every 푘1 + 1 ≤ 푖 ≤ 푘 and there is no 푖 such that 푥푣 = 0 for every 푣 ∈ 훽2. Suppose

푣0 ∈ 훽1 ∩ 훽2. Since 퐺 is in a reduced row form, there is a coordinate 푖0 of 푣0 such that

it is the only non-null entry of the 푖0-th column of 퐺. Then, there exist 1 ≤ 푗1 ≤ 푘1 and

푘1 + 1 ≤ 푗2 ≤ 푘 such that the coordinate 푖0 of both 푤푗1 and 푤푗2 are non-null, which is a

contradiction, therefore 훽1 ∩ 훽2 = ∅. By construction, 푠푢푝푝(훽1) ∩ 푠푢푝푝(훽2) ≠ ∅, otherwise

the decomposition defined by 퐺 would be not trivial. Let 푖0 ∈ 푠푢푝푝(훽1) ∩ 푠푢푝푝(훽2),

then 푖0 ̸∈ 푠푢푝푝(풞0) and, supposing 푖0 ∈ 푠푢푝푝(풞2), it follows that 푖0 ̸∈ 푠푢푝푝(풞1). Since

푖0 ∈ 푠푢푝푝(훽1), there exists a row 푤 of 퐺1 such that its coordinate 푖0 is non-null. Therefore,

푖0 ∈ 푠푢푝푝(풞1), which is a contradiction. Hence, the decomposition determined by 퐺 does not admits refinements, therefore, it is maximal.

The previous proposition ensures that each maximal decomposition is obtained by taking a generator matrix in a generalized reduced row echelon form. In the follow- 푛 ing, we will consider the triangular subgroup of isometries 풢푃 ⊆ 퐺퐿푃 (F푞 ). By Lemma 3.2.20, these are the only isometries that matter when we are looking for maximal 푃 -

decompositions. By definition of 풢푃 , the following two operations over a generator matrix 퐺 will provide matrices generating equivalent codes to the one generated by 퐺:

(OP 1) If 푔푖0푗0 ≠ 0, 푔푖푗0 = 0 for every 푖 ≠ 푖0 (푔푖0푗0 is the only non-zero entry on the 푗0-th

column) and 푟 6 푗0 (푟 ≠ 푗0), we may assume 푔푖0푟 = 0;

This is equivalent to choosing the isometry 푇 ∈ 풢푃 such that 푇 (푒푗) = 푒푗 for every

푗 ≠ 푗0 and −1 푇 (푒푗0 ) = 푒푗0 − 푔푖0푟푔푖0푗0 푒푟.

(OP 2) More generally, if 푟 6 푗1, 푗2, . . . , 푗푠 (푟 ≠ 푗푖 for every 푖) and there two rows 푖1 and 푖2 ∑︀푠 ∑︀푠 of 퐺 such that 푔푖1푟 = 푙=1 푥푙푔푖1푗푙 and 푔푖2푟 = 푙=1 푥푙푔푖2푗푙 for some choice of 푥1, . . . , 푥푙,

and 푔푖푗푙 = 0 for every 푖 ≠ 푖1, 푖2 (푔푖1푗푘 and 푔푖2푗푘 are the only non-zero entry on the

푗푘-th column for every 푘 ∈ {1, . . . , 푠}), we may assume 푔푖1푟 = 푔푖2푟 = 0. Even more generally, the procedure can be performed simultaneously to many lines, all those entries may be considered to be 0. If the column 푟 is a linear combination of columns 76

푗1, . . . , 푗푠, then one may exchange the column 푟 by the null column. Let {푔1, . . . , 푔푛} be the set of columns of 퐺 and suppose

푠 ∑︁ 푔푟 = 푥푖푔푗푖 . 푖=1

Then, in order to perform the exchange of the 푟-th column of 퐺 by a null column we

have to consider the isometry 푇 ∈ 풢푃 defined by 푇 (푒푖) = 푒푖 for every 푖 ̸∈ {푗1, . . . , 푗푠} and

푇 (푒푗푖 ) = 푒푗푖 − 푥푖푒푟

for every 푖 ∈ {1, . . . , 푠}.

Definition 3.2.33. Let 퐺 be a generator matrix for an [푛, 푘, 훿]푞 code 풞. If 퐺 is in a generalized reduced row echelon form and no operations as defined in (1) and (2) can be performed over 퐺, we say that 퐺 is in a 푃 -canonical form.

Example 3.2.34. Let 풞 be the [6, 3]2 code with generator matrix given by

⎡ ⎤ 0 0 1 1 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 퐺 = ⎢ 1 0 1 1 1 0 ⎥ . ⎢ ⎥ ⎣ ⎦ 1 1 0 0 0 0

Consider the poset 푃1 with order relations 1 6 2 and 3 6 4. By using operations (OP 1) and (OP 2) we get the following matrix in a 푃1-canonical form:

⎡ ⎤ 0 0 0 1 0 1 ⎢ ⎥ ⎢ ⎥ ′ ⎢ ⎥ 퐺 = ⎢ 1 0 0 1 1 0 ⎥ . ⎢ ⎥ ⎣ ⎦ 0 1 0 0 0 0

Furthermore, if we consider 푃2 a poset such that 푃1 ⊂ 푃2 and 4 6 5, then using operation (OP 1) we get ⎡ ⎤ 0 0 0 1 0 1 ⎢ ⎥ ⎢ ⎥ ′′ ⎢ ⎥ 퐺 = ⎢ 1 0 0 0 1 0 ⎥ . ⎢ ⎥ ⎣ ⎦ 0 1 0 0 0 0

′′ It is clear that 퐺 is in a 푃2-canonical form. It is also clear that it determines a maximum

푃2-decomposition for 풞. 77

In the following, given 푇 ∈ 풢푃 , denote by 퐴푇 = (푎푖푗) ∈ 퐺푃 its matrix according to the canonical basis.

푟 Theorem 3.2.35. Let 퐺 be a generator matrix of a code 풞 and let C = (풞; 풞0; 풞푖)푖=1 be the decomposition determined by 퐺. Then, if 퐺 is in a 푃 -canonical form, the decomposition C is a maximum 푃 -decomposition of 풞.

Proof. We will first show that C does not admit aggregations. Suppose there is a 푃 - ′ decomposition C determined by the linear isometry 푇 ∈ 풢푃 such that this decomposition ′ ′ ′ 푟 is obtained by an aggregation from C , i.e., C = (푇 (풞); 풞0; 풞푖)푖=1 and 휑푇 (푠푢푝푝(풞0)) ⊂ ′ 푠푢푝푝(풞0). Since 푇 ∈ 풢푃 , the map 휑푇 coincides with the identity map, therefore 푠푢푝푝(풞0) ⊂ ′ 1 푠푢푝푝(풞0). Consider the matrix 퐺1 = (푔푖푗) which 푖-th row is obtained by the action of 푇 in the 푖-th row of 퐺. Hence, 퐺1 generates 푇 (풞). We remark that 퐴푇 = (푎푖푗) ∈ 퐺푃 is the ′ matrix obtained by 푇 . If 푖0 ∈ 푠푢푝푝(풞0) ∖ 푠푢푝푝(풞0), then 푖0 is a null column of 퐺1 obtained by the aggregation. The characterization of 퐺푃 ensures that

1 ∑︁ 0 = 푔푙푖0 = 푎푖0푗푔푙푗 푗 푖06푗

for every 푙 ∈ {1, . . . , 푘}. Since 푎푖0푖0 ≠ 0, it follows that

(︃ )︃ ∑︁ 푎푖0푗 푔푙푖0 = − 푔푙푗 (3.6) 푗 푎푖0푖0 푖06푗,푖0̸=푗 for every 푙 ∈ {1, . . . , 푘}. Since 퐺 is in a 푃 -canonical form, 푖0 can not be a column 푗(푖) for every 푖 ∈ {1, . . . , 푘}. Equation 3.6 together with Operation (2) ensures that the 푖0-th ′ ′ column of 퐺 is already null, so 푖0 ∈ 푠푢푝푝(풞0). Therefore, 푠푢푝푝(풞0) ⊂ 푠푢푝푝(풞0), so C can not be obtained by aggregations from C . We now prove that C does not admit a splitting. To do so, we will assume, without loss of generality, C = (풞; 풞0; 풞) is the decomposition determined by 퐺. Let

푇 ∈ 풢푃 be the isometry determining the splitting for C and consider 퐺1 as in the first part of the proof. Hence, the rows of 퐺1 can be split into two sets with disjoint support, i.e., there exist two disjoint sets 퐼1 and 퐼2 formed by rows of 퐺1 such that |퐼1|= 푘1,

|퐼2|= 푘2 and 푘1 + 푘2 = 푘. Therefore, 퐼1 and 퐼2 generate 풞1 and 풞2 respectively, and ′ 2 C = (푇 (풞); 풞0; 풞푖)푖=1 is the maximum decomposition determined by 퐺1. Since the rows of 퐺 cannot be split in this form, let 푖0 be the right-most column of 퐺 such that 푖0 belongs 78

to the support of the first 푘1 and also of the last 푘2 rows of 퐺. Since the support of the subspaces generated by the first 푘1 and the last 푘2 rows of 퐺1 respectively are disjoint ′ and 푖0 ̸∈ 푠푢푝푝(풞0) since 푖0 ̸∈ 푠푢푝푝(풞0), then 푖0 or belongs to the support of the first 푘1 or of the last 푘2 rows of 퐺1. Without loss of generality, suppose 푖0 belongs to the support of the last 푘2 rows of 퐺1. Therefore, if 푙 is an integer such that 0 ≤ 푙 ≤ 푘1, then

1 ∑︁ 0 = 푔푙푖0 = 푎푖0푗푔푙푗. 푗 푖06푗

Thus, (︃ )︃ ∑︁ 푎푖0푗 푔푙푖0 = − 푔푙푗 (3.7) 푗 푎푖0푖0 푖06푗,푖0̸=푗 for every 푙 ∈ {1, . . . , 푘1}.

Therefore, 푖0 is not the last column of 퐺 such that 푔푙푖0 ≠ 0 for every 푙 ∈

{1, . . . , 푘1}. Each column contributing to the summation 3.7 has support either in the first 푘1 or in the last 푘2 rows of 퐺, therefore, Operation (OP 2) ensures that every vector in the first 푘1 rows of 퐺 has zero in the coordinate 푖0. Hence, 푖0 is not in the support of the first 푘1 rows of 퐺. Therefore, 풞 = 풞1 ⊕ 풞2 where 풞1 and 풞2 are generated by the ′ first 푘1 and the last 푘2 rows of 퐺 respectively. We conclude that C and C have the same ′ profile, since 퐺1 is also in a reduced row form and by Proposition 3.2.32, C is a maximal decomposition of 풞′, therefore, C is a maximal 푃 -decomposition.

Each time we can exchange a column by a null column, we exchange a code 푟 풞 with 푃 -decomposition C = (풞, 풞0, 풞푖)푖=1 by a 푃 -equivalent code with 푃 -decomposition ′ ′ ′ ′ 푟 ′ C = (풞 , 풞0, 풞푖)푖=1 where 푑푖푚(풞0) = 푑푖푚(풞0) + 1. If the Operation (OP 2) is performed in a proper subset of the 푘-lines of 퐺, we do not increase 푑푖푚(풞0) but, we may split some ′ ′ ′ ′ of the 풞푖 into 풞푖 = 풞푖1 ⊕ 풞푖2 with 푠푢푝푝(풞푖1 ) ∩ 푠푢푝푝(풞푖2 ) = ∅. Note that the role of 푃 in such operations rests solely on the condition 푗 6

푗1, 푗2, . . . , 푗푛1 . For the two extremal posets, namely, the anti-chain and the chain poset, the picture is absolutely clear: If 푃 is an anti-chain (hence we are considering the Ham- ming metric), no such operation may be performed (since 푖 6 푗 ⇐⇒ 푖 = 푗). Hence, maximal 푃 -decompositions coincide with maximal decompositions (see the paragraph af- ter Definition 3.2.13) and the 푃 -canonical form presented in Proposition 2.1.14 provides it; if 푃 is a chain with 1 6 2 6 ··· 6 푛, then the first operation may be performed to 79 every 푗 (푖) in the reduced row echelon matrix 퐺, i.e., we have that 풞 is equivalent to a code that has a generating matrix 퐺 = (푔푖푗) such that 푔푖푗(푖) = 1 and 푔푖푗 = 0 if 푗 ≠ 푗 (푖) (already known from [38]). The other case that can be easily described is the case of hierarchical posets. The algorithm to find 푃 -decompositions according to the levels of the poset was first proposed in [15]. In that work, the basis of the code providing such decomposition was also constructed. We explicit the matrix of this basis in Corollary 3.1.5. Its form is called canonical-systematic and provides a maximal 푃 -decomposition for 풞. Furthermore, the canonical-systematic form is a standard representation in the sense that every code has such decomposition. In [30], it was proved that for every non-hierarchical poset, it is not possible to give a standard representation like the one obtained in Corollary 3.1.5. Actually, this was the main motivation to define a canonical form as we did in Definition 3.2.33.

3.2.4 Complexity of Syndrome Decoding Algorithm

As seen in section 1.2.1, considering a metric determined by a weight, hence invariant by translations, a syndrome algorithm, which depends on the choice of the coset leaders, is an implementation of an MD-decoder. This is the case for poset metrics. Given an [푛, 푘, 훿]푞 code 풞, a look-up table for syndrome decoding of 풞 (the table composed by the coset leaders) has 푞푛−푘 elements. This is an algebraic invariant and does not depend on the metric. Nevertheless, we can use refinements of decompositions in order to minimize the search space (number of cosets) by isometrically immersing the code into a smaller dimension space or by performing syndrome in each component of the decomposition. In this section we describe the cases for which such improvements are possible. 푟 Let C = (풞; 풞0; 풞푖)푖=1 be a maximal 푃 -decomposition of an [푛, 푘, 훿]푞 code 풞.

Initially, note that in order to perform decoding, we can ignore the component 푉0 (recall that 푉푖 is the support-space of 풞푖) since we know that any codeword 푐 = (푐1, . . . , 푐푛) ∈ 풞

풞 푛 푛1+···+푛푟 should have 푐푗 = 0 for every 푗 ∈ [푛] . Consider the projection 휋 : F푞 −→ F푞 (recall that 푛푖 = |푠푢푝푝(풞푖)|) defined by

휋(푥1, . . . , 푥푛) = (푥푖1 , . . . , 푥푖푠 ) 80

푖 < ··· < 푖 {푖 , . . . , 푖 } 푠푢푝푝 풞 휋 휋| 푟 where 1 푠 and 1 푠 = ( ). The map 1,...,푟 = ⊕푖=1푉푖 , the restriction 푟 of 휋 to the space ⊕푖=1푉푖, is a bijection. Therefore, by pushing forward the metric in the

휋 푛1+···+푛푟 restriction, we obtain the metric 푑푃 in F푞 defined by

휋 −1 −1 푑푃 (푥, 푦) := 푑푃 (휋푖,...,푟(푥), 휋푖,...,푟(푦))

푛1+···+푛푟 휋 for every 푥, 푦 ∈ F푞 . The metric 푑푃 turns the restriction 휋1...푟 into a linear isometry.

Because 풞 is a subspace of 푉1 ⊕...⊕푉푟, the proof of the proposition below follows straight from these observations.

푛 Proposition 3.2.36. The metric-decoding criteria of F푞 for 풞 is equivalent to the metric-

푛1+···+푛푟 decoding criteria of F푞 for 휋(풞), i.e.,

′ 휋 ′ 휋 푑푃 (푐 , 푦) = min 푑푃 (푐, 푦) ⇐⇒ 푑푃 (휋(푐 ), 휋(푦)) = min 푑푃 (푐, 휋(푦)). 푐∈풞 푐∈휋(풞)

By Proposition 3.2.36, to perform syndrome decoding, instead of using a look-

푛 푛−푘 푛1+···+푛푟 up table with |F푞 /풞|= 푞 elements, we can reduce the number of cosets to |F푞 /휋(풞)|=

∏︀푟 푛푖−푘푖 푖=1 푞 elements. Note that

푟 ∏︁ 푞푛0 × 푞푛푖−푘푖 = 푞푛−푘. 푖=1

Therefore, 푃 -decompositions having the maximum possible number of elements in the support of 풞0 are the best ones in order to perform syndrome decoding. Besides this possible (and a-posteriori irrelevant) gain in the cardinality of the syndrome look-up table obtained considering aggregations, there is a more significant gain that can be obtained through the splitting operation.

Proposition 3.2.37. If

⟨푠푢푝푝(풞푖)⟩푃 ∩ ⟨푠푢푝푝(풞푗)⟩푃 = ∅

푛 for all 푖 ≠ 푗 and 푖, 푗 ≠ 0, then given 푦 ∈ F푞 ,

푟 ∑︁ 휋푖 min 푑푃 (푐, 푦) = min 푑푃 (휋푖(푐), 휋푖(푦)). 푐∈풞 푐∈풞 푖=1 푖 81

푛푖 Proof. Note that if 푐 ∈ 풞, then 푐 = 푐1 + ··· + 푐푟 with 푐푖 ∈ 풞푖 and for every 푦푖 ∈ F푞 ,

푠푢푝푝(푦푖 − 푐푖) ⊂ ⟨푠푢푝푝(풞푖)⟩ = 푠푢푝푝(푉푖).

Then,

푑(푦, 푐) = 푑(푦1, 푐1) + ··· + 푑(푦푟, 푐푟),

푛푖 where 푦 = 푦1 + ··· + 푦푟 and 푦푖 ∈ F푞 for all 푖 ∈ [푟].

Due to Proposition 3.2.37, if the ideals generated by each component are dis- joint, syndrome decoding can be done independently in each component 풞푖. Therefore,

푛−푘 ∑︀푟 푛푖−푘푖 the number of cosets can be reduced from 푞 elements to 푖=1 푞 elements, where

푛푖 ′ the last one is the sum of the cosets in each quotient F푞 /휋푖(풞푖). More generally, if 푟 is a disjoint union of subsets 퐼푖, i.e., [푟] = 퐼1 ⊔ ... ⊔ 퐼푠, and

⟨푠푢푝푝(⊕푖∈퐼푗 풞푖)⟩푃 ∩ ⟨푠푢푝푝(⊕푖∈퐼푙 풞푖)⟩푃 = ∅,

for every 푗 ≠ 푙, then decoding can be separately done in each projection of ⊕푖∈퐼푗 풞푖 into 푁 ∑︀ F푞 where 푁 = 푖∈퐼푗 푛푖. Until now, in order to obtain the reductions, it was not necessary to change the syndrome decoding algorithm; we just performed it in a different (smaller) space. The hierarchical relation among elements of the poset will provide us a “quasi-independent” syndrome decoding algorithm that is performed by choosing first coordinates that hi- erarchically dominate others, therefore we will call this algorithm a Leveled Syndrome Decoding. If 퐼, 퐽 ⊂ [푛], we say that 퐼 and 퐽 are hierarchically related if every element in 퐼 is smaller than every element in 퐽. Suppose [푟] is an ordered disjoint union of sets,

[푟] = 퐼1 ⊔...⊔퐼푠, such that 푠푢푝푝(⊕푖∈퐼푗 풞푖) is hierarchically related with 푠푢푝푝(⊕푖∈퐼푗+1 풞푖) for 푗 ∈ { , . . . , 푠 − } 푖 ∈ 푠푢푝푝 ⊕ 풞 푖 푖 푖 ∈ 푠푢푝푝 ⊕ 풞 every 1 1 , so if 0 ( 푖∈퐼푗0 푖), then 0 6 for every ( 푖∈퐼푙 푖) with 푙 > 푗0. By using the syndrome decoding algorithm described in Section 1.2.1, the leveled syndrome decoding algorithm is as follows:

Leveled Syndrome Decoding Algorithm 1.

푛 Input: 푦 = 푦1 + ··· + 푦푠 ∈ F푞 where 푦푖 ∈ ⊕푗∈퐼푖 푉푗 For each 푖 ∈ {1, . . . , 푠} do 82

∑︀ 푛푗 푗∈퐼푖 휋푗∈퐼푖 Decode 휋푗∈퐼푖 (푦푖) ∈ F푞 using syndrome (푑푃 ) outputting 푐푖 ∈ 휋푗∈퐼푖 (⊕푗∈퐼푖 풞푗)

−1 −1 Output: 푐 = 휋푗∈퐼1 (푐1) + ··· + 휋푗∈퐼푠 (푐푠).

As a particular case of a syndrome algorithm, we can order the levels to be decoded and do the following:

Leveled Syndrome Decoding Algorithm 2.

푛 Input: 푦 = 푦1 + ··· + 푦푠 ∈ F푞 where 푦푖 ∈ ⊕푗∈퐼푖 푉푗 For 푖 = 푠 to 1, do

If 휋푗∈퐼푖 (푦푖) ∈ ⊕푗∈퐼푖 풞푗 do

푐푖 = 휋푗∈퐼푖 (푦푖); else do ∑︀ 푛푗 푗∈퐼푖 휋푗∈퐼푖 Decode 휋푗∈퐼푖 (푦푖) ∈ F푞 using syndrome (푑푃 ) outputting 푐푖 ∈ 휋푗∈퐼푖 (⊕푗∈퐼푖 풞푗); Go to Output; end if; end For;

−1 −1 −1 Output: 푐 = 휋푗∈퐼푖 (푐푖) + 휋푗∈퐼푖+1 (푐푖+1) + ··· + 휋푗∈퐼푠 (푐푠).

The first algorithm was described in [15] for the hierarchical poset case.The second one is also a minimum distance algorithm according to the poset metric 푑푃 (see the next proposition), but if there is an error in a particular level, the decoder outputs the null vector in the levels covered by the one where the error happened.

Proposition 3.2.38. Algorithm 2 determines a minimum distance decoder according to the metric 푑푃 .

푛 Proof. Given 푦 ∈ F푞 and 푐 ∈ 풞, then

휋푗∈퐼푖 푑푃 (푦, 푐) = 푑푃 (휋푗∈퐼푖 (푦), 휋푗∈퐼푖 (푐))

′ where 푖 is the largest integer such that 휋푗∈퐼푖 (푦) ≠ 휋푗∈퐼푖 (푐). Therefore, if 푐 ∈ 풞 satisfies

′ 푑푃 (푦, 푐 ) = min 푑푃 (푦, 푐), 푐∈풞

′′ ′ ′ ′ then 푐 = 푐푠 + 푐푠−1 + ··· + 푐푖 also attains the minimum and this is the codeword returned 83 by algorithm 2.

Both the algorithms are needed to store the look-up tables of each quotient 푁 ∑︀ F푞 /휋푗∈퐼푖 (⊕푗∈퐼푖 풞푗) where 푗∈퐼푖 푛푗. The total number of elements we need to store is

푠 ∑︁ ∏︁ 푞푛푗 −푘푗 . (3.8) 푖=1 푗∈퐼푖

The difference between Algorithm 1 and 2 is that while the search in the algorithm 1is performed over all elements of the look-up table, the search in Algorithm 2 is restricted to the first level where an error has occurred, i.e., if this happens in thelevel 푖0, then the 휋 ⊕ 풞 search is performed only over the look-up table of 푗∈퐼푖0 ( 푗∈퐼푖0 푗) which has

∏︁ 푞푛푗 −푘푗 (3.9)

푗∈퐼푖0 elements. Note that if the poset has only one level, the values of Expressions (3.8) and (3.9) coincide. 84

Chapter 4

Expected Loss

In this chapter we explore the framework of expected loss introduced in [18] and [16]. This concept allows us to give different treatments for distinct kind of errors. As showed in [18], that seems important when dealing with image transmission. In this context, we express the mean of the expected losses according to a given decoder and, in a particular case, we show that the lexicographic encoder is better than the mean. Due to experimental results we conjcture that the lexicographic encoder is a Bayes encoder in the analized case. In the last section, we relate the theory of expected loss with unequal error protection presenting some conjectures and coding constructions to achieve unequal error protection. 푛 Given an [푛, 푘, 훿]푞 linear code 풞 ⊂ F푞 , the error probability of 풞 (Equation 1.1) may be rewritten in the following way

퐷 1 ∑︁ ∑︁ ∑︁ ′ 푃푒 (풞) = 푘 푊 (푦|푐) 퐷(푐 |푦), 푞 푛 ′ 푐∈풞 푦∈F푞 푐 ∈풞 푐′̸=푐 where 퐷 is a decoder (stochastic map) and 푊 is a channel conditional probability. Let

R+ be the set of non-negative real numbers. Consider the well-known indicator function

휇0-1 : 풞 × 풞 → R+ given by ⎧ ⎪ ′ ′ ⎨ 0 if 푐 = 푐 휇0-1 (푐, 푐 ) = . ⎩⎪ 1 if 푐 ≠ 푐′ 85

Therefore, the error probability may be expressed as

퐷 1 ∑︁ ∑︁ ∑︁ ′ ′ 푃푒 (풞) = 푘 푊 (푦|푐) 휇0-1(푐, 푐 )퐷(푐 |푦). (4.1) 푞 푛 ′ 푐∈풞 푦∈F푞 푐 ∈풞

The function 휇0-1 detects decoding errors but does not distinguish such errors. In some cases, depending on the nature of the information, we may assign for each pair (푐, 푐′) ∈ 풞 ×풞, a real value representing the amount of loss obtained provided that 푐 is transmitted but is decoded as 푐′ by the receiver. Such loss measures will be used to define the expected loss of an encoding-decoding scheme.

4.1 Expected Loss

The extension of the error probability definition to expected loss was explained in [18]. The motivation of this extension arises from the fact that in many real-world situations, such as the transmission of digital images, it is reasonable to attribute different values to different errors, contrasting with the indicator function that appears in theerror probability definition. To attribute different values for each exchange of information, we shall replace the indicator function 휇0-1 by a value function that may assume any (non- negative) real value. 푘 Until now, the information set was identified with F푞 . From now on, we will assume that the information set is any set ℐ with 푞푘 elements. This is important since 푘 the identification with F푞 may raise some confusion, because the measures that will be defined in this chapter are determined according to the nature of the information. An error value function for the information set ℐ is a map 휇 that associates to each pair of information (element of ℐ) a non-negative real number

휇 : ℐ × ℐ → R+

where 휇 (휄1, 휄2) is the cost of exchanging 휄2 by 휄1. If 풞 is a code and 푓 : ℐ → 풞 is an encoder, we denote by

휇푓 : 풞 × 풞 → R+ 86

the error value function induced by the encoder 푓, i.e., given 휄1, 휄2 ∈ ℐ,

휇푓 (푓 (휄1) , 푓 (휄2)) := 휇 (휄1, 휄2) .

We shall refer to 휇 and 휇푓 as just an error value function. By considering such value function, we are interested in evaluating the errors that may occur during the process consisting of coding, transmitting and decoding. Therefore, it is reasonable to assume that 휇 has the following properties:

(a) 휇 is symmetric, i.e., 휇(휄1, 휄2) = 휇(휄2, 휄1);

(b) 휇(휄, 휄) = 0 for all 휄 ∈ ℐ.

Given an encoding-decoding scheme (풞, 푓, 퐷) and an error value function 휇, let us denote by E(풞, 휇푓 , 퐷) the expected loss of (풞, 푓, 퐷) with respect to 휇, i.e.,

1 ∑︁ ∑︁ ∑︁ ′ ′ E(풞, 휇푓 , 퐷) = 푘 푊 (푦|푐) 휇푓 (푐, 푐 )퐷(푐 |푦). (4.2) 푞 푛 ′ 푐∈풞 푦∈F푞 푐 ∈풞

The only difference from this equation to Equation 4.1 is the exchanging of 휇0-1 by 휇푓 . As noticed in the first chapter, the error probability of 풞 does not depend on the choice of the encoder 푓. In other words, given a decoder and a channel, the error probability of 풞 is invariant under permutations of the encoder 푓 (which is a bijection between ℐ and 풞). Since 휇 is defined using the nature of the information set and the decoder uses onlythe characteristics of the code, for a given code 풞, the expected loss is not invariant according to the choice of an encoder for 풞. Therefore, a new variable (the encoder) is introduced when dealing with expected loss. In this case, the problem of minimizing both the error and refusal probabilities can be generalized to the problem of minimizing the expected loss and the refusal probability.

Definition 4.1.1. Given an information set ℐ and a error value function 휇, an encoding- * * * decoding scheme (풞 , 푓 , 퐷 ) is an (푛, 푘)푞-Bayes scheme according to 휇 if it satisfies

* * 퐷* * (︁ 퐷 )︁ E(풞 , 휇푓 * , 퐷 ) + 푃푟푒푓 (풞 ) = min E(풞, 휇푓 , 퐷) + 푃푟푒푓 (풞) . (풞,푓,퐷)

Note that we are not chosing previously the code 풞 as in Definition 1.1.9. Unequal error protection uses basically two techniques in order to unequally protect errors: 87 adding more redundancy in some coordinates than others, or; increasing the decision region for some codewords; such strategies are respectively called bit-wise and message- wise unequal error protection; see [4] for more details. In [13], for a given code, it was noticed that protection against different kinds of errors depends not only on the decoder, but also upon the encoder. In that work, optimal decoders were characterized (optimal according to a concept called separation). Our concept of optimal decoders and encoders are going to be defined by decoders and encoders minimizing the expected loss according to a given error value function. Similarly to what was done in the first chapter for error probability, considering a normalized error value function (휇(휄1, 휄2) ≤ 1 for all 휄1, 휄2 ∈ ℐ), we can assume that the refusal probability is zero and that decoders are deterministic maps. Therefore, from this point forward, if 휇 is a loss function, then 휇 : ℐ × ℐ → [0, 1] where [0, 1] ⊂ R.

Proposition 4.1.2. Given a code 풞, for every decoder 퐷 ∈ 풟F푞 (풞) and encoder 푓, there exists a decoder 퐷̃︁ ∈ 풟F푞 (풞) such that

퐷 E(풞, 휇푓 , 퐷̃︁) ≤ 푃푟푒푓 (풞) + E(풞, 휇푓 , 퐷)

퐷̃︀ and 푃푟푒푓 (풞) = 0.

푛 Proof. Given a decoder 퐷 ∈ 풟F푞 (풞), suppose there exists 푦0 ∈ F푞 such that 퐷(∞|푦0) = 1. * * * Define a new decoder 퐷 ∈ 풟F푞 (풞) satisfying 퐷 (푦) = 퐷(푦) for all 푦 ≠ 푦0 but 퐷 (푦0) is * a new distribution which does not refuse 푦0, so 퐷 (∞|푦0) = 0. Using the same reasoning of the Proposition 1.1.10, we get that

퐷 퐷* 1 ∑︁ 푃푟푒푓 (풞) = 푃푟푒푓 (풞) + 푘 푊 (푦0|푐) 푞 푐∈풞 and * 1 ∑︁ ∑︁ ′ * ′ E(풞, 휇푓 , 퐷) = E(풞, 휇푓 , 퐷 ) − 푘 푊 (푦0|푐) 휇푓 (푐, 푐 )퐷 (푐 |푦0). 푞 푐∈풞 푐′∈풞 Therefore,

⎛ ⎞ * 퐷* 퐷 1 ∑︁ ∑︁ ′ * ′ E(풞, 휇푓 , 퐷 )+푃푟푒푓 (풞) = 푃푟푒푓 (풞)+E(풞, 휇푓 , 퐷)+ 푘 푊 (푦0|푐) ⎝ 휇푓 (푐, 푐 )퐷 (푐 |푦0) − 1⎠ . 푞 푐∈풞 푐′∈풞 88

′ ′ Since 휇푓 (푐, 푐 ) ≤ 1 for every 푐, 푐 ∈ 풞, then for every 푐 ∈ 풞,

∑︁ ′ * ′ 휇푓 (푐, 푐 )퐷 (푐 |푦0) − 1 ≤ 0. 푐′∈풞

Thus, * 퐷* 퐷 E(풞, 휇푓 , 퐷 ) + 푃푟푒푓 (풞) ≤ 푃푟푒푓 (풞) + E(풞, 휇푓 , 퐷).

퐷* * * Therefore, if 푃푟푒푓 (풞) = 0, then 퐷̃︁ = 퐷 , otherwise, applying the same arguments for 퐷 until we run out of refused elements, 퐷̃︁ may be constructed.

Proposition 4.1.3. Let 풞 be an [푛, 푘, 훿]푞 linear code, 퐷 ∈ 풟F푞 (풞) be a decoder satisfying 퐷 푃푟푒푓 (풞) = 0 and 푓 be an encoder for 풞. Then, there exists a deterministic decoder 푔 such that

E(풞, 휇푓 , 푔) ≤ E(풞, 휇푓 , 퐷).

Proof. If 퐷 is a stochastic map modeling a decoder for 풞 with no refusals, then by 4.2,

⎛ [︃ ]︃ ⎞ 1 ∑︁ ∑︁ ∑︁ ′ ′ E(풞, 휇푓 , 퐷) = 푘 ⎝ 푊 (푦|푐)휇푓 (푐, 푐 ) 퐷(푐 |푦)⎠ 푞 푛 ′ 푦∈F푞 푐 ∈풞 푐∈풞

푛 where 푓 is an encoder and 휇 is an error value function. For each 푦 ∈ F푞 , choose 푐푦 ∈ 풞 such that ∑︁ ∑︁ ′ 퐴푦 = 푊 (푦|푐)휇푓 (푐, 푐푦) ≤ 푊 (푦|푐)휇푓 (푐, 푐 ) 푐∈풞 푐∈풞

′ ∑︀ ′ for every 푐 ∈ 풞. Note that 푐′∈풞 퐷(푐 |푦) = 1, then

1 ∑︁ 1 ∑︁ ∑︁ ′ 1 ∑︁ ∑︁ ′ 푘 퐴푦 = 푘 퐴푦 퐷(푐 |푦) = 푘 퐴푦퐷(푐 |푦) (4.3) 푞 푛 푞 푛 ′ 푞 푛 ′ 푦∈F푞 푦∈F푞 푐 ∈풞 푦∈F푞 푐 ∈풞

1 ∑︁ ∑︁ ∑︁ ′ ′ ≤ 푘 푊 (푦|푐)휇푓 (푐, 푐 )퐷(푐 |푦) = E(풞, 휇푓 , 퐷). (4.4) 푞 푛 ′ 푦∈F푞 푐 ∈풞 푐∈풞

Denote by 푔 the map defined by 푔(푦) = 푐푦, then 푔 is a deterministic decoder for 풞. Furthermore,

1 ∑︁ ∑︁ 1 ∑︁ E(풞, 휇푓 , 푔) = 푘 푊 (푦|푐)휇푓 (푐, 푔(푦)) = 푘 퐴푦 ≤ E(풞, 휇푓 , 퐷). 푞 푛 푞 푛 푦∈F푞 푐∈풞 푦∈F푞

In order to minimize the sum of the expected loss and the refusal probability, 89 similarly to what was done in the first chapter, Propositions 4.1.2 and 4.1.3 ensure wecan assume that decoders are deterministic maps and their refusal probabilities are zero (note that error value functions are considered to be normalized). Therefore, the expected loss can be rewritten as follows

1 ∑︁ ∑︁ E(풞, 휇푓 , 푔) = 푘 푊 (푦|푐)휇푓 (푐, 푔(푦)). 푞 푛 푦∈F푞 푐∈풞

In a general setting, we consider the following data to be given:

(1) The error value function 휇, determined by the nature of the information;

(2) The size of the code 풞, determined by the size of the information set: |ℐ|= 푞푘;

(3) The information rate, determined by cost constraints;

(4) The channel model 푊 , determined by physical conditions.

* * * In such a setting, we say that the triple (풞 , 푓 , 푔 ) is an (푛, 푘)푞-Bayes coding- decoding scheme if * * E (풞 , 휇푓 * , 푔 ) = min E (풞, 휇푓 , 푔) (풞,푓,푔) where the minimum is taken over all encoding-decoding schemes for ℐ over 푊 . We may consider each of the variables 풞, 푓 and 푔 independently.

Definition 4.1.4. A decoder 푔* is a Bayes decoder of the pair (풞, 푓) if

* (풞, 휇푓 , 푔 ) = min (풞, 휇푓 , 푔). E 푔 E

Definition 4.1.5. An encoder 푓 * is a Bayes encoder of the pair (풞, 푔) if

(풞, 휇푓 * , 푔) = min (풞, 휇푓 , 푔). E 푓 E

푛 It is clear that a decoder 푔 is a Bayes decoder if, and only if, for any 푦 ∈ F푞 ,

{︃ }︃ ∑︁ ∑︁ ′ ′ 푊 (푦|푐)휇푓 (푐, 푔(푦)) = min 푊 (푦|푐)휇푓 (푐, 푐 ): 푐 ∈ 풞 . 푐∈풞 푐∈풞 90

Note that if 푔 is a Bayes decoder and 휇 is the indicator function 휇0-1, then

∑︁ ∑︁ ∑︁ ′ ∑︁ 푊 (푦|푐) = 푊 (푦|푐)휇0-1(푐, 푔(푦)) ≤ 푊 (푦|푐)휇0-1(푐, 푐 ) = 푊 (푦|푐) 푐∈풞 푐∈풞 푐∈풞 푐∈풞 푐̸=푔(푦) 푐̸=푐′ for every 푐′ ∈ 풞. Therefore, 푊 (푦|푔(푦)) ≥ 푊 (푦|푐′) for every 푐′ ∈ 풞, i.e., 푊 (푦|푔(푦)) = max{푊 (푦|푐): 푐 ∈ 풞}.

In other words, 푔 is an ML decoder.

4.2 Bayes Encoders

Even considering the situation when the code and the decoder are previously chosen, finding Bayes encoders may still be a difficult problem that, up toourknowl- edge, has not been explored in the literature. A characterization of Bayes encoders in a general setting depends on the loss function. We stress that for an [푛, 푘, 훿]푞 code 풞, the “complexity” to find a Bayes encoder is 푞푘! (the number of encoders for 풞). To estimate a qualitative measure of a proposed encoder without running over all possible encoders, the average expected loss may be used.

Lemma 4.2.1. Let 푔 be a decoder of an [푛, 푘, 훿]푞 linear code 풞 and 휇 an error value function. Considering the function 휇푚푒푎푛 where 휇푚푒푎푛(푐, 푐) = 0 for every 푐 ∈ 풞 and

∑︀ 푓 휇푓 (푐1, 푐2) 휇푚푒푎푛(푐1, 푐2) = 푞푘! for every 푐1 ≠ 푐2. Then, the average of the expected losses is the expected loss given by

휇푚푒푎푛, i.e., ∑︀ 푓 E(풞, 휇푓 , 푔) E(풞, 휇푚푒푎푛, 푔) = . 푞푘! Proof. It follows straight from the fact that

∑︀ 푓 E(풞, 휇푓 , 푔) 1 ∑︁ ∑︁ ∑︁ 푘 = 푘 푊 (푦|푐)푃 (푐) 휇푓 (푔(푦), 푐). (4.5) 푞 ! 푞 ! 푛 푐∈풞 푦∈F푞 푓 91

With the same statements of the previous Lemma, we have the following propo- sition:

Proposition 4.2.2. For every 푐1, 푐2 ∈ 풞 with 푐1 ≠ 푐2,

푡 1 ∑︁ 휇푚푒푎푛(푐1, 푐2) = 푘 푘 |퐴푗푠 |푗푠 푞 (푞 − 1) 푠=1

where {푗1, . . . , 푗푡} is the set of all possible real values assumed by 휇 and 퐴푗푘 = {(휄1, 휄2) ∈

ℐ × ℐ : 휇(휄1, 휄2) = 푗푘} for every 푘 ∈ [푡].

Proof. For every 푐1, 푐2 ∈ 풞 with 푐1 ≠ 푐2,

푡 푡 ∑︁ ∑︁ ∑︁ ∑︁ 푘 휇푓 (푐1, 푐2) = 푗푠 = (푞 − 2)! |퐴푗푠 |푗푠 푓 푠=1 푓 푠=1 휇푓 (푥,푦)=푗푠 푡 푘 ∑︁ = (푞 − 2)! |퐴푗푠 |푗푠, 푠=1

where 퐴푗푠 = {(휄1, 휄2) ∈ ℐ × ℐ : 휇(휄1, 휄2) = 푗푠}. Therefore,

∑︀ 푡 푓 휇푓 (푐1, 푐2) 1 ∑︁ 푘 = 푘 푘 |퐴푗푠 |푗푠. 푞 ! 푞 (푞 − 1) 푠=1

An information set, as any finite set, may be endowed with an additive group structure. Despite that, the group operation does not, in general, translate the significance of the information. When the information set may naturally be endowed with an additive group structure and the error value function is invariant by this operation, i.e., if 휇 satisfies

휇(휄1 + 휄3, 휄2 + 휄3) = 휇(휄1, 휄2) for all 휄1, 휄2, 휄3 ∈ ℐ, in [17], it was presented a characterization of the linear-Bayes encoders (the linear encoders minimizing among all linear encoders, the expected loss). If 푔 is a decoder for 풞, the decision regions for each 푐 ∈ 풞, i.e., 푔−1(푐), 푛 determine a partition of F푞 , i.e.,

푘 ᨁ −1 F푞 = 푔 (푐). 푐∈풞

Then, 1 ∑︁ ∑︁ ∑︁ ′ E (풞, 휇푓 , 푔) = 푘 휇푓 (푐, 푐 ) 푊 (푦|푐) . 푞 푐∈풞 푐′∈풞 푦∈푔−1(푐′) 92

Therefore, 1 ∑︁ ′ ′ E (풞, 휇푓 , 푔) = 푘 퐺푔 (푐, 푐 ) 휇푓 (푐, 푐 ) (4.6) 푞 (푐,푐′)∈풞×풞 where ′ ∑︁ 퐺푔 (푐, 푐 ) = 푊 (푦|푐) . (4.7) 푦∈푔−1(푐′)

If 휇 is invariant by translations and 푓 is a linear encoder, then 휇푓 is also invariant by translations. Therefore,

1 ∑︁ ′ ′ E (풞, 휇푓 , 푔) = 푘 퐺푔 (푐, 푐 ) 휇푓 (푐 − 푐 , 0) . 푞 (푐,푐′)∈풞×풞

Thus, writing 푢 = 푐 − 푐′,

(︃ )︃ 1 ∑︁ ∑︁ 1 ∑︁ ∑︁ E (풞, 휇푓 , 푔) = 푘 퐺푔 (푐, 푐 − 푢) 휇푓 (푢, 0) = 푘 퐺푔 (푐, 푐 − 푢) 휇푓 (푢, 0) . 푞 푐∈풞 푢∈풞 푞 푢∈풞 푐∈풞

Proposition 4.2.3. [17] Let 풞 = {푐1, . . . , 푐푞푘 } be an [푛, 푘, 훿]푞 linear code, 푔 a decoder and 휇 an error value function invariant by translations. Suppose, without loss of generality,

∑︁ ∑︁ 퐺푔(푐, 푐 − 푐1) ≥ · · · ≥ 퐺푔(푐, 푐 − 푐푞푘 ). 푐∈풞 푐∈풞

Then, 푓 is a linear-Bayes encoder if, and only if,

휇푓 (푐1, 0) ≤ · · · ≤ 휇푓 (푐푞푘 , 0).

Even though the characterization obtained by Proposition 4.2.3 provides a simple way to construct linear-Bayes encoders, invariance by translation is an artificial condition for the majority of the applications. As we shall see, the general case is much harder.

Suppose 풞 = {푐1, . . . , 푐푞푘 }, then

푞푘 푞푘 1 ∑︁ ∑︁ E(풞, 휇푓 , 푔) = 푘 퐺푔(푐푖, 푐푗)휇푓 (푐푖, 푐푗) (4.8) 푞 푖=1 푗=1

푔 푓 where 퐺푔 is as in (4.7). Using Equation (4.8), if 퐴 = (푎푖푗) and 퐵 = (푏푖푗) are matrices 93 defined by

푎푖푗 = 퐺푔(푐푖, 푐푗) and 푏푖푗 = 휇푓 (푐푖, 푐푗), then 1 푔 푓 (풞, 휇푓 , 푔) = 푇 푟(퐴 퐵 ) E 푞푘 where 푇 푟(.) is the matrix trace function. Given an encoder 푓, if 푃 is a permutation matrix (푃 is obtained by permuting the rows or columns of the identity matrix), then 퐻 = 푃 퐵푓 푃 푇 is a matrix constructed by using an encoder ℎ, i.e., 퐻 = 퐵ℎ. In addition, if 푓 and ℎ are two encoders, there is a permutation matrix 푃 such that 퐵ℎ = 푃 퐵푓 푃 푇 . Therefore, we have the following theorem.

′ Theorem 4.2.4. Given a decoder 푔′, an encoder 푓 ′ and the corresponding matrices 퐴푔

′ and 퐵푓 , then ′ 1 푔′ 푓 ′ 푡 min E(풞, 휇푓 , 푔 ) = min 푇 푟(퐴 푃 퐵 푃 ), 푓 푞푘 푃 where the minimum on the left side is over all permutation matrices.

The previous characterization brings the Bayes encoder problem into a class of well-studied problems, “minimize the trace of matrices”. Let 푓 be an encoder. If 푘 푘 푓,휎 푓,휎 휎 :[푞 ] → [푞 ] is a permutation (bijection) and 퐵 is the matrix with entries 퐵푖푗 =

휇푓 (푐휎(푖), 푐휎(푗)), then the problem of finding a Bayes encoder is equivalent to the problem of finding a permutation 휎 of the codewords such that 퐴푔퐵푓,휎 minimizes the trace (note that 퐵푓,휎 = 푃 퐵푓 푃 푡 for some permutation matrix 푃 ). It is clear that the search space has 푞푘! elements (the number of permutations in a set with 푞푘 elements). In the next section we will work with a particular (the simplest in terms of codes) case.

4.3 A particular Case

From now on, we will assume that the channel is a binary DSMC channel with conditional probabilities given by

(︃ 푝 )︃푑퐻 (푦,푐) 푊 (푦|푐) = (1 − 푝)푛 , 1 − 푝 94 where 푦 and 푐 are words with length 푛, 0 ≤ 푝 ≤ 1/2 is the error probability of each symbol and 푑퐻 is the Hamming distance. Let us consider the unusual case of a code with no redundancy at all, i.e., let 풞 be an [푛, 푛]2 code. In such a case, to find a Bayes encoder-decoder pair is equivalent to find a Bayes encoder, since there is a unique decision to be made: accept as true whatever the message you receive. Codes like those (without redundancy) are rare but still may be used, as we can see in the recent use, for image transmission, in the satellite CBERS-2 (http://www.cbers.inpe.br/ingles/). However, more than looking for possible applications, we believe that understanding this instance may be a key for the general case (where 풞 has redundancy). 푛 We stress that a permutation 휎 ∈ 풮푛 acts on F푞 by permuting the coordinates.

푛 Proposition 4.3.1. If 풞 = F푞 is a code without redundancy over a DSMC channel, then the expected loss is invariant over permutations of the coordinates.

Proof. Let 푓 : ℐ → 풞 be an encoder minimizing the expected loss, without loss of generality, suppose 푓(푖푗) = 푐푗. Let 휎 be a permutation within the code and denote 휎 휎(1) 휎(푛) 푛 휎 푐푗 = (푐푗 , . . . , 푐푗 ). It is clear that there is a permutation 휏 over 푞 such that 푐휏(푗) = 푐푗 .

Define ℎ : 풞 → 풞 such that ℎ(푐푗) = 푐휏(푗), as 푑퐻 (푐푖, 푐푗) = 푑퐻 (푐휏(푖), 푐휏(푗)) then ℎ ∘ 푓 is an encoder satisfying

E(풞, 휇푓 , 푔) = E(풞, 휇ℎ∘푓 , 푔)

As an immediate consequency,

푛 Corollary 4.3.2. When 풞 = F푞 , there are at least 푛! Bayes encoders.

푛 Suppose ℐ = {푖0, 푖1, . . . , 푖2푛−1} for some integer 푛, so that |ℐ|= 2 . Motivated by applications in image transmission, there is a natural normalized error value function 휇 in ℐ, namely:

1 2 1 2 휇(푖푠, 푖푡) = |푖푠 − 푖푡| = |푠 − 푡| for all 푖푠, 푖푡 ∈ ℐ. (4.9) (2푛 − 1)2 (2푛 − 1)2

푛 1 2푛 Let 풞 = F2 and consider 풞 = {푐 , . . . , 푐 } where the codewords are lexico- 푖 푖 푖 푗 푗 푗 graphically ordered, i.e., if 푐 = (푐1, . . . , 푐푛) and 푐 = (푐1, . . . , 푐푛), then

푛 푛 ∑︁ 푖 푙−1 ∑︁ 푗 푙−1 푖 ≤ 푗 ⇐⇒ 푐푙2 ≤ 푐푙 2 . 푙=1 푙=1 95

푛 2 푛 Setting 푗푠 = (푠/(2 − 1)) for every 푠 ∈ {0, 1,..., 2 − 1}, the set {푗0, 푗1, . . . , 푗2푛−1} is the image of 휇. Thus, 푛 |퐴푗푠 |= 2(2 − 푠)

푛 푛 for every 푠 ∈ [2 −1] and |퐴푗0 |= 2 , where 퐴푗푠 was defined in Proposition 4.2.2. Therefore,

2푛 2푛 ∑︀2푛−1 푛 2 1 ∑︁ ∑︁ 푠=1 (2 − 푠)푠 푗 푖 E(풞, 휇푚푒푎푛, 푔) = 푛 푛−1 푛 3 푊 (푐 |푐 ) 2 푖=1 푗̸=푖 2 (2 − 1) 푗=1 푖 푗 푛 2푛 2푛 ∑︀2푛−1 푛 2 (︃ )︃푑퐻 (푐 ,푐 ) (1 − 푝) ∑︁ ∑︁ 푠=1 (2 − 푠)푠 푝 = 푛 푛−1 푛 3 2 푖=1 푗̸=푖 2 (2 − 1) 1 − 푝 푗=1 푖 푗 푛 2푛 2푛 푛−1 푛 (︃ )︃푑퐻 (푐 ,푐 ) (1 − 푝) ∑︁ ∑︁ 2 (2 + 1) 푝 = 푛 푛 2 , 2 (2 − 1) 푖=1 푗̸=푖 3 1 − 푝 푗=1 and the last equality holds because

2푛−1 1 ∑︁ 22푛−2(2푛 − 1)(2푛 + 1) = (2푛 − 푠)푠2. 3 푠=1

푗+1 An encoder 푓 defined by 푓(푖푗) = 푐 is called a lexicographic encoder. For such an encoder we have that

푖 푗 푛 2푛 2푛 (︃ )︃푑퐻 (푐 ,푐 ) (1 − 푝) ∑︁ ∑︁ 2 푝 E(풞, 휇푓 , 푔) = 푛 푛 2 (푖 − 푗) . (4.10) 2 (2 − 1) 푖=1 푗̸=푖 1 − 푝 푗=1

It is intuitive and quite obvious that E(풞, 휇푓 , 푔) is independent on the encoder and decoder. For 푝 = 0 and 푝 = 1/2,

E(풞, 휇푓 , 푔) = E(풞, 휇푚푒푎푛, 푔). (4.11)

Based on experimentations and the knowledge that the result is true in small dimensions (the Example 4.3.4 describe the 8-dimensional case), we have the following conjecture:

Conjecture 4.3.3.

푖 푗 푖 푗 2푛 2푛 (︃ )︃푑퐻 (푐 ,푐 ) 2푛 2푛 푛−1 푛 (︃ )︃푑퐻 (푐 ,푐 ) ∑︁ ∑︁ 푝 ∑︁ ∑︁ 2 (2 + 1) 푝 (푖 − 푗)2 < . 푖=1 푗̸=푖 1 − 푝 푖=1 푗̸=푖 3 1 − 푝 푗=1 푗=1 for every positive integer 푛 and every 0 < 푝 < 1/2.

If the previous conjecture is true, then for every 0 < 푝 < 1/2 the expected loss 96 according to 푓 is smaller than the mean of the expected losses, i.e.,

E(풞, 휇푓 , 푔) < E(풞, 휇푚푒푎푛, 푔).

In the following, we shall describe a particular case (with 푛 = 8), based on the 256 gray levels of the RGB color model, which fits well with the expected loss 휇 described in (4.9), in [18] two others error value functions were used in this context. In this case, we manage to conclude that the expected loss obtained by the lexicographic encoder is better than the mean of the expected losses.

Example 4.3.4. Suppose 푛 = 8 and that ℐ = {0, 1,..., 255} is the set of all gray colors in the RGB color model (black is represented by 0 and white by 255). Then,

1 2 1 2 휇(푖푠, 푖푡) = |푖푠 − 푖푡| = |푠 − 푡| ∀ 푖푠, 푖푡 ∈ ℐ. 2552 2552

1 256 푗+1 8 Suppose 풞 = {푐 , . . . , 푐 } and 푓(푖푗) = 푐 a lexicographic encoder, for every 푗 ∈ [2 ]. It follows that

푖 푗 8 28 28 (︃ )︃푑퐻 (푐 ,푐 ) (1 − 푝) ∑︁ ∑︁ 32896 푝 E(풞, 휇푚푒푎푛, 푔) = 16646400 푖=1 푗̸=푖 3 1 − 푝 푗=1 (1 − 푝)8 (︃67371008푟 235798528푟2 471597056푟3 589496320푟4 = + + + 16646400 3 3 3 3 471597056푟5 235798528푟6 67371008푟7 8421376푟8 )︃ + + + + 3 3 3 3 where 푟 = 푝/(1 − 푝). On the other hand,

푖 푗 8 28 28 (︃ )︃푑퐻 (푐 ,푐 ) (1 − 푝) ∑︁ ∑︁ 2 푝 E(풞, 휇푓 , 푔) = (푖 − 푗) 16646400 푖=1 푗̸=푖 1 − 푝 푗=1 (1 − 푝)8 (︁ = 6379776푟 + 40988416푟2 + 119295232푟3 + 195767040푟4 16646400 + 193932032푟5 + 115625216푟6 + 38366976푟7 + 5462272푟8 ) .

Therefore, E(풞, 휇푓 , 푔) 푞(푟) = = E(풞, 휇푚푒푎푛, 푔) 3(24921 + 160111푟 + 465997푟2 + 764715푟3 + 757547푟4 + 451661푟5 + 149871푟6 + 21337푟7) 32896(8 + 28푟 + 56푟2 + 70푟3 + 56푟4 + 28푟5 + 8푟6 + 푟7) 97 for every 0 < 푟 < 1. Since the derivative

62475(1 + 푟)6(28 + 56푟 + 70푟2 + 56푟3 + 28푟4 + 8푟5 + 푟6) 푞′(푟) = , 32896(8 + 28푟 + 56푟2 + 70푟3 + 56푟4 + 28푟5 + 8푟6 + 푟7)2 is positive in the interval (0, 1), 푞 is strictly increasing in this interval. If 푟 = 1, then 푝 = 1/2, therefore, by (4.11), 푞(1) = 1. Hence, 0 < 푞(푟) < 1 for all 푟 ∈ (0, 1), thus for every 푝 ∈ (0, 1/2),

E(풞, 휇푓 , 푔) < E(풞, 휇푚푒푎푛, 푔).

As we will see below, we have reasons to believe that the encoder 푓 as con- structed is a Bayes encoder for the 푛-dimensional case. The graphic 4.1 represents, for the 8-dimensional case, the expected losses for 푝 varying from 0 to 1/2. The lexicographic encoder is represented by the red line. The blue dots correspond to encoders randomly sampled from each 푝 in the set {0, 0.005, 0.01,..., 0.5}. This picture not only suggests that lexicographic encoders are optimal, but they are rare and there is a large concentration close to the mean encoder performance (the green line).

0.15

0.10

0.05

0.1 0.2 0.3 0.4 0.5

Figure 4.1: p in the interval [0, 1/2].

The construction we did for the 8-dimensional case is similar for other dimen- sions, but proving that lexicographic encoders are optimal for general 푛 and 푝 is still an open problem. 98

4.4 Expected Loss and Unequal Error Protection

Given a decoder 푔, an encoder 푓 and an error value function 휇, in this section we propose a construction of codes with unequal error protection. The goal is to improve the expected loss of the code obtained by the direct product of two codes, each one “optimal” for the given parameters. Even though this construction may not be an optimal construction, we will see that in some cases the performance of this code, according to expected loss, is improved when compared to the product code. 푘 Let ℐ = F2 be the information set. Consider the following data to be given:

• 푊 is a binary DSMC;

• 푓 is a lexicographic encoder;

• 푔 is a minimum distance decoder determined by the Hamming metric 푑퐻 ;

One of the existing formulations of unequal error protection (UEP) is the bit- wise UEP, in this formulation, the coordinates (bits) of the information set are partitioned into subsets and the decoding errors in different parts of bits are viewed as different kind 푘 of errors. Since ℐ = F2, we will assume that 푘 is a positive even integer and that the first 푘/2 coordinates are less important than the last 푘/2 coordinates. In other words, the last 푘/2 coordinates need more protection against errors than the first 푘/2 coordinates. 푘/2 One usual approach [4] is to encode each space F2 separately and take the Cartesian product of the codes. Suppose the low-priority bits (the fist 푘/2 coordinates) are encoded using an [푛1, 푘/2, 훿1]2 linear code 풞1 and that the high-priority bits are encoded with an [푛2, 푘/2, 훿2]2 linear code 풞2 with 푛2 > 푛1. Therefore, the code 풞 = 풞1 × 풞2 has the unequal error protection property with more protection devoted to the more significant information. We will call such a code of a 2-levels product code (in order to recall that the code was constructed using two levels of UEP). There is a natural error value function associated with the 2-levels product code, the one expressing the difference between errors that may occur in the two distinct 푘 ′ ′ ′ 푘 levels. Given 푣 = (푣1, . . . , 푣푘) ∈ F2 and 푣 = (푣1, . . . , 푣푘) ∈ F2, let 푟 ∈ R+ with 푟 < 1 and 99 let 휇푟 be an error value function defined by

⎧ ⎪ 0 if 푣 = 푣′ ⎪ ⎨⎪ 푟 ′ ∑︀푘 ′ 푖−1 푘/2+1 휇 (푣, 푣 ) = 푟 if 푖=1(푣푖 − 푣푖)2 < 2 . ⎪ ⎪ ⎩⎪ 1 otherwise

This function outputs 1 if an error occurs in the high-priority bits and 푟 < 1 for errors occurred in the low-priority bits. We recall that the number 푟 is an invariant depending on the difference among errors occurred in the high-priority and low-priority bits. In the information theory literature, it is common to reject a message if the errors are above some threshold. Since we are assuming decoders with refusal probability zero, it is more important to concentrate the errors in the less important information coordinates (where the error value is low), and this is achieved by protecting more the high-priority bits (where the error value is high). By using the concept of expected loss in this ambient, we may allow more errors provided that they are concentrated in the low-priority bits, therefore, in general, expected loss does not minimize the number of expected errors. The Plotkin’s construction is a well-known method to construct codes. In particular, the Reed-Muller family of codes are obtained using this construction, see [3].

Basically, given an [푛1, 푘1, 훿1]2 code 풞1 and an [푛2, 푘2, 훿2]2 code 풞2 and suppose 푛1 = 푛2, the Plotkin construction is the code 풞 given by

풞 = 풞1 * 풞2 = {(푢 + 푣, 푣): 푢 ∈ 풞1 and 푣 ∈ 풞2}.

Here we generalize this construction. Suppose 푛1 ≤ 푛2, then consider an 푛2 × 푛1 matrix

퐴 = (푎푖푗) such that 푎푖푗 ∈ F2. Hence, the 퐴-generalized Plotkin’s construction is given by

풞 = 풞1 *퐴 풞2 = {(푢 + 푣퐴, 푣): 푢 ∈ 풞1 and 푣 ∈ 풞2}.

When 푛1 = 푛2 and 퐴 is the identity matrix, the generalized construction coincides with the classical Plotkin’s construction. When 퐴 is the null matrix, we have the Cartesian product of two codes which is denoted by 풞1 × 풞2.

Suppose 푘1 = 푘2 and 푛1 < 푛2 (the information encoded in 풞2 is more pro- tected). Note that the error value function 휇푟 is defined in such a way that errors oc- 100 curred simultaneously in the high-priority and low-priority bits have the same error value of the errors occurred only in the high priority bits. In other words, the high-priority bits dominate the low-priority bits in terms of errors. Therefore, a good choice of a matrix 퐴 would be the matrix 퐴 having a maximal number of 푣 ∈ 풞2 such that 푣퐴 ̸∈ 풞1. To do so, we will describe the construction of 퐴 used in our simulations which, we believe, due to the previous observations, it is a reasonable construction. Let 훽 = {푐1, . . . , 푐푘1 } be a basis for 풞1 and consider

′ 훽 = {푐1, . . . , 푐푘1 , 푣1, . . . , 푣푛1−푘1 }

푛1 a basis for F2 extended from 훽. If 푘1 > 푛1 − 푘1, choose 퐴 having 푣푖 as its 푖-th row and for every 푗 > 푛1 − 푘1 the 푗-th row of 퐴 is null. Therefore, the image of 풞2 by 퐴 is the space generated by {푣1, . . . , 푣푛1−푘1 }. If 푘1 ≤ 푛1 − 푘1 take 푣푖 as the 푖-th row of 퐴 for every

푖 ≤ 푘1 and consider the remaining rows to be null. In this case, the image of 풞2 by 퐴 is the subspace generated by {푣1, . . . , 푣푘1 }. Using the 퐴-generalized Plotkin’s construction according to the matrix 퐴 previously constructed, we have the following conjecture:

Conjecture 4.4.1. Let 0 < 푝 < 1/2 be the error probability of the binary DSMC. For every 0 < 푟 < 1, 푟 ′ 푟 ′′ E(풞1 *퐴 풞2, 휇푓 ′ , 푔 ) ≤ E(풞1 × 풞2, 휇푓 ′′ , 푔 ) where 푓 ′ and 푓 ′′ are lexicographic encoders and 푔′ and 푔′′ are syndrome decoders according to the Hamming metric.

The Best Known linear code 풞푏 is the code with length 푛 and dimension 푘 minimizing the error probability, among all known codes with the same parameters.

Conjecture 4.4.2. If 0 < 푝 < 1/2 is the error probability of the binary DSMC, then there exists 0 < 푟0(푝) < 1 such that for every 푟 < 푟0(푝),

푟 ′ 푟 ′′ E(풞1 *퐴 풞2, 휇푓 ′ , 푔 ) ≤ E(풞푏, 휇푓 ′′ , 푔 ).

Also, if 푝 → 0 then 푟0(푝) → 0.

Besides the simulations suggesting the truthfulness of the two previous con- jectures, the fact that the 퐴-generalized Plotkin’s construction uses the coordinates of 풞1 101

as redundancy for the code 풞2 suggests that the code 풞2 is even more protected. One of the experiments that strengthen the truthfulness of such conjectures will be described in the rest of this section as an example.

Example 4.4.3. Let 풞1 be a [6, 3, 3]2 code and 풞2 be a [10, 3, 5]2 code with generator matrices

⎡ ⎤ ⎡ ⎤ 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 퐺1 = ⎢ 0 1 0 1 1 0 ⎥ and 퐺2 = ⎢ 0 1 0 1 1 0 1 0 1 0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0

respectively. Let 풞3 be the best known linear code with parameters [16, 6, 6]2. By using the Magma Computational Algebra System version 2.21-10, this code has generator matrix

⎡ ⎤ 1000000010011110 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0100000001001111 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0010000100011011 ⎥ ⎢ ⎥ 퐺3 = ⎢ ⎥ . ⎢ ⎥ ⎢ 0001000110110001 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0000100111100100 ⎥ ⎣ ⎦ 0000010011110010

Consider the 10 × 6 matrix 퐴 given by

⎡ ⎤ 0 0 0 1 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0 0 0 0 1 0 ⎥ 퐴 = ⎢ ⎥ ⎢ ⎥ ⎢ 0 0 0 0 0 1 ⎥ ⎢ ⎥ ⎣ ⎦ 07×6

where 07×6 is a null submatrix with order 7 × 6. Then, the code 풞1 *퐴 풞2 has generator matrix ⎡ ⎤ 1001010000000000 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0101100000000000 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0011110000000000 ⎥ ⎢ ⎥ 퐺4 = ⎢ ⎥ ⎢ ⎥ ⎢ 0001001001011001 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0000100101101010 ⎥ ⎣ ⎦ 0000010011111100 102

In this scenario, considering the error probability 푝 of the binary DSMC to be 0.001, 0.01, 0.1 and 0.2 and considering 푟 (푟 is the parameter given by the error value function 휇푟) as a variable, we have the following graphics:

Expected Loss Expected Loss 0.012

0.00008 0.010

0.00006 0.008

0.00004 0.006

0.00002 0.004

r 0.002 0.02 0.04 0.06 0.08 0.10 r 0.02 0.04 0.06 0.08 0.10

푝 = 0.001 푝 = 0.01 Expected Loss Expected Loss 3.2

15.0 3.0

2.8 14.5

2.6 14.0

2.4 13.5 2.2

r r 0.00 0.02 0.04 0.06 0.08 0.10 0.02 0.04 0.06 0.08 0.10

푝 = 0.1 푝 = 0.2

Note that the previous graphs were plotted using 푟 ∈ (0, 0.1), since the for small error probability 푝, the lines are so close that we cannot distinguish them if we would plot for every 푟 in the interval (0, 1). Due to the characterization of 휇푟, it is possible to prove that 푟 E(풞, 휇푓 , 푔) is linear in 푟. Hence, by analyzing the inclination of these straight lines (what was done using Magma), we get that for every 푝 ∈ {0.001, 0.01, 0.1, 0.2} and 0 ≤ 푟 ≤ 1,

푟 ′ 푟 ′′ E(풞1 *퐴 풞2, 휇푓 ′ , 푔 ) ≤ E(풞1 × 풞2, 휇푓 ′′ , 푔 ) as expected. Furthermore, comparing the 퐴-generalized Plotkin’s construction with the best known linear code, using the same values of 푝, we get the following graphics:

Direct Product A-Generalized Plotkin's Construction 103

Expected Loss Expected Loss

0.00008 0.04

0.00006 0.03

0.00004 0.02

0.00002 0.01

r r 0.02 0.04 0.06 0.08 0.10 0.1 0.2 0.3 0.4 0.5

푝 = 0.001 푝 = 0.01 Expected Loss Expected Loss 9 30

8

7 25

6

5 20

4

3 15

r r 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

푝 = 0.1 푝 = 0.2

Best Known A-Generalized Plotkin's Construction

In order to make visible the intersection points of lines in the graphs, the interval plotted for both the first and second graphs were reduced. Note that the4 graphs suggest that the intersection point converges to zero if 푝 → 0. We stress that in this case we can find the exact point of intersection of the two straight lines sincethe dimension is small and can be computationally calculated. 104

FUTURE PERSPECTIVES

Extended Poset Metrics

By characterizing a big family of metrics, the description of all possible de- coders with some characteristics can be solved, this would be very impressive if this family of metrics respect some particular construction, as in the poset metrics. Motivated by this characterization problem, we will propose an even more general family of metrics than the poset one. Given a set 푌 , remark that any subset 푋 ⊂ 풫(푌 ) of the powerset of 푌 is a poset with the inclusion relation. In the following, consider 푋 ⊂ 풫(푌 ) such that

푌 ⊂ ∪퐴∈푋 퐴. Remark that if 퐴, 퐵 ∈ 푋 and 퐴 ⊂ 퐵, 퐵 is said a covering of 퐴 if there is no 퐶 ∈ 푋 such that 퐴 퐶 퐵. Given a poset 푃 = (푋, 6) over 푋, if 퐵 ⊂ 푌 and 퐼 is an ideal in 푃 such that 퐵 ⊂ ∪퐴∈퐼 퐴, 퐼 is a cover ideal of 퐵 if there is no ideal 퐽 in 푃 such that 퐵 ⊂ ∪퐴∈퐽 퐴 and |퐽|< |퐼|. Denote by 푁푃 (퐵) the cardinality of the ideal covering 퐵,

푁푃 (퐵) = min{|퐼| : 퐼 covers 퐵}.

If 퐼 and 퐽 are cover ideals of 퐵, then |퐼|= |퐽|, therefore the function 푁푃 is well defined.

Proposition 4.4.4. 푁푃 (퐴 ∪ 퐵) ≤ 푁푃 (퐴) + 푁푃 (퐵)

Proof. If 퐼 and 퐽 are ideals covering 퐴 and 퐵 respectively, then 퐼 ∪퐽 is an ideal satisfying 퐴 ∪ 퐵 ⊂ 퐼 ∪ 퐽, then there is an ideal 푈 covering 퐴 ∪ 퐵 such that 퐴 ∪ 퐵 ⊂ 푈 ⊂ 퐼 ∪ 퐽, therefore 푁푃 (퐴 ∪ 퐵) ≤ 푁푃 (퐴) + 푁푃 (퐵).

푛 Definition 4.4.5. With the notations above, supposing 푌 = [푛], if 푥 ∈ F푞 , the extended poset 푃 -weight is defined by being the smallest cardinality of the ideals covering the support of 푥,

푤푃 (푥) = 푁푃 (푠푢푝푝(푥)). 105

Proposition 4.4.6. Extended poset weights preserve support.

푛 Proof. If 푤푃 is an extended poset weight and 푥, 푦 ∈ F푞 are vectors such that 푠푢푝푝(푥) ⊂

푠푢푝푝(푦), then every ideal covering 푠푢푝푝(푦) is an ideal containing 푠푢푝푝(푥), then 푤푃 (푥) ≤

푤푃 (푦).

The weight function 푤 is clearly non-negative, furthermore, 푤(푥) = 0 if, and 푛 only if, 푥 = 0 since [푛] ⊂ ∪퐴∈푋 퐴. Given 푥, 푦 ∈ F푞 , because 푠푢푝푝(푥 + 푦) ⊂ 푠푢푝푝(푥) ∪ 푠푢푝푝(푦), by Propositions 4.4.4 and 4.4.6, it follows that 푤(푥+푦) ≤ 푤(푥)+푤(푦). Therefore, the extended poset weight 푤푃 is a norm function and metrics can be defined according to these norms.

Definition 4.4.7. Given an extended poset weight 푤푃 , the extended metric (or extended

푃 -metric) is the metric induced by 푤푃 , i.e.,

푑푃 (푥, 푦) = 푤푃 (푥 − 푦)

푛 for every 푥, 푦 ∈ F푞 .

Despite the definition of these metrics is quite broad, these metrics possess a nice structure, indeed, they are invariant by translations since are defined by norms, and these norms preserve support, as stated in Proposition 4.4.6. In [11], it was showed that, up to a decoding equivalence, any metric space may be embedded into a hypercube with the Hamming metric. Therefore, for decoding purposes, the Hamming metric can be always used, but the decoding complexity may in- clease since the embedding immerse the code into a higher dimensional space. A natural question rises when working in this level of generality, among the metrics preserving sup- port, does the family of extended poset metrics characterizes, up to decoding equivalence, any minimum distance decoder? We do not have the answer for this question but seems to be a fruitful line of research.

Better Hierarchical Bounds

The set 풫푛 of all posets over [푛] is, by itself, a poset, as we saw in Chapter 2.

If we consider the graph that has 풫푛 as the set of vertices and an edge connects 푃 to 푄 if, 106 and only if, 푃 covers 푄 or 푄 covers 푃 , then we have a natural distance defined between two elements 푃, 푄 ∈ 풫푛: 푑 (푃, 푄) is the minimal length of a path in the graph connecting 푃 to 푄. We remark that given the upper and lower neighbors 푃 − and 푃 +, in general, there may be a hierarchical poset 푄 with 푑 (푃, 푄) ≤ 푑 (푃, 푃 −) and 푑 (푃, 푄) ≤ 푑 (푃, 푃 +). It is possible that this finer notion of proximity of posets allow us to determine better bounds than the ones obtained by the upper and lower neighbors 푃 + and 푃 −. 107

Bibliography

[1] Marcelo Muniz S Alves. “A standard form for generator matrices with respect to the Niederreiter-Rosenbloom-Tsfasman metric”. In: Information Theory Workshop (ITW). IEEE. 2011, pp. 486–489.

[2] Marcelo Muniz S Alves, Luciano Panek, and Marcelo Firer. “Error-block codes and poset metrics”. In: Advances in Mathematics of Communications 2.1 (2008), pp. 95– 111.

[3] D J Baylis. Error Correcting Codes: A Mathematical Introduction. Vol. 15. CRC Press, 1997.

[4] Shashi Borade, Bariş Nakiboğlu, and Lizhong Zheng. “Unequal error protection: An information-theoretic perspective”. In: Information Theory, IEEE Transactions on 55.12 (2009), pp. 5511–5539.

[5] Thomas Britz and Peter Cameron. “Partially ordered sets”. In: J. of Formalized Mathematics 1 (2002).

[6] Richard A Brualdi, Janine Smolin Graves, and K Mark Lawrence. “Codes with a poset metric”. In: Discrete Mathematics 147.1 (1995), pp. 57–72.

[7] Antonio Campello et al. “Perfect codes in the lp metric”. In: European Journal of Combinatorics 53 (2016), pp. 72–85.

[8] Sung Hee Cho and Dae San Kim. “Automorphism group of the crown-weight space”. In: European Journal of Combinatorics 27.1 (2006), pp. 90–100.

[9] Serban D Constantin and TRN Rao. “On the theory of binary asymmetric error correcting codes”. In: Information and Control 40.1 (1979), pp. 20–36.

[10] Michel Marie Deza and Elena Deza. Encyclopedia of distances. Springer, Berlin, 2009. 108

[11] Rafael GL D’Oliveira and Marcelo Firer. “Channel Metrization”. In: arXiv preprint arXiv:1510.03104 (2015).

[12] Rafael Gregorio Lucas D’Oliveira and Marcelo Firer. “The packing radius of a code and partitioning problems: The case for poset metrics on finite vector spaces”. In: Discrete Mathematics 338.12 (2015), pp. 2143–2167.

[13] Larry A Dunning and Woodrow E Robbins. “Optimal encodings of linear block codes for unequal error protection”. In: Information and control 37.2 (1978), pp. 150–177.

[14] Ben Dushnik and Edwin W Miller. “Partially ordered sets”. In: American Journal of Mathematics (1941), pp. 600–610.

[15] Luciano Viana Felix and Marcelo Firer. “Canonical-systematic form for codes in hierarchical poset metrics”. In: Advances in Mathematics of Communications 6.3 (2012), pp. 315–328.

[16] Marcelo Firer, Luciano Panek, and Jerry Anderson Pinheiro. “Coding and Decoding Schemes for MSE and Image Transmission”. In: arXiv preprint arXiv:1411.1139 (2014).

[17] Marcelo Firer, Luciano Panek, and Laura Rifo. “Coding in the presence of semantic value of information: Unequal error protection using poset decoders”. In: arXiv preprint arXiv:1108.3832 (2011).

[18] Marcelo Firer, Laura L Ramos Rifo, and Luciano Panek. “Coding and decoding schemes tailor made for image transmission”. In: Information Theory and Applica- tions Workshop (ITA), 2013. IEEE. 2013, pp. 1–8.

[19] Marcelo Firer and Judy L Walker. “Matched Metrics and Channels”. In: arXiv preprint arXiv:1506.03782 (2015).

[20] Ernst Gabidulin. “A brief survey of metrics in coding theory”. In: Mathematics of Distances and Applications (2012), pp. 66–84.

[21] Ernst Gabidulin. “Combinatorial metrics in coding theory”. In: 2nd International Symposium on Information Theory. Akadémiai Kiadó. 1973.

[22] S Golomb. “A general formulation of error metrics (Corresp.)” In: Information The- ory, IEEE Transactions on 15.3 (1969), pp. 425–426. 109

[23] Marcus Greferath et al. “MacWilliams’ Extension Theorem for Bi-Invariant Weights Over Finite Principal Ideal Rings”. In: arXiv preprint arXiv:1309.3292 (2013).

[24] Jonathan I Hall. Notes on coding theory. FreeTechBooks. com, 2003.

[25] Richar W. Hamming. “Error detecting and error correcting codes”. In: Bell System technical journal 29.2 (1950), pp. 147–160.

[26] Abramo Hefez and Maria Lúcia T Villela. Códigos corretores de erros. Instituto de Matematica Pura e Aplicada, 2008.

[27] W Cary Huffman and Vera Pless. Fundamentals of error-correcting codes. Cam- bridge university press, 2003.

[28] C. Y. Lee. “Some properties of nonbinary error-correcting codes”. In: IRE Transac- tions on Information Theory 4.2 (1958), pp. 77–82.

[29] Kwankyu Lee. “The automorphism group of a linear space with the Rosenbloom– Tsfasman metric”. In: European Journal of Combinatorics 24.6 (2003), pp. 607– 612.

[30] Roberto Assis Machado, Jerry Anderson Pineiro, and Marcelo Firer. “Characteriza- tion of metrics induced by hierarchical posets”. In: arXiv preprint arXiv:1508.00914 (2015).

[31] Florence Jessie MacWilliams and Neil James Alexander Sloane. The theory of error correcting codes. Elsevier, 1977.

[32] Burt Masnick and Jack Wolf. “On linear unequal error protection codes”. In: Infor- mation Theory, IEEE Transactions on 13.4 (1967), pp. 600–607.

[33] James L. Massey. Notes on coding theory, class notes for course 6.575 (spring). M.I.T., Cambridge, MAA, 1967.

[34] Joseph Neggers and Hee Sik Kim. Basic posets. World Scientific, 1998.

[35] Harald Niederreiter. “A combinatorial problem for vector spaces over finite fields”. In: Discrete Mathematics 96.3 (1991), pp. 221–228.

[36] Harald Niederreiter. “Orthogonal arrays and other combinatorial aspects in the theory of uniform point distributions in unit cubes”. In: Discrete mathematics 106 (1992), pp. 361–367. 110

[37] Harald Niederreiter. “Point sets and sequences with small discrepancy”. In: Monat- shefte für Mathematik 104.4 (1987), pp. 273–337.

[38] Luciano Panek, Marcelo Firer, and Marcelo Muniz Silva Alves. “Classification of niederreiter–rosenbloom–tsfasman block codes”. In: Information Theory, IEEE Trans- actions on 56.10 (2010), pp. 5207–5216.

[39] Luciano Panek et al. “Groups of linear isometries on poset structures”. In: Discrete Mathematics 308.18 (2008), pp. 4116–4123.

[40] Woomyoung Park and Alexander Barg. “Linear ordered codes, shape enumarators and parallel channels”. In: Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on. IEEE. 2010, pp. 361–367.

[41] Claudio Qureshi and Sueli IR Costa. “On perfect q-ary codes in the maximum metric”. In: Information Theory and Applications (2016).

[42] Gérald Séguin. “On metrics matched to the discrete memoryless channel”. In: Jour- nal of the Franklin Institute 309.3 (1980), pp. 179–189.

[43] Danilo Silva and Frank R Kschischang. “On metrics for error correction in network coding”. In: Information Theory, IEEE Transactions on 55.12 (2009), pp. 5479– 5490.

[44] Werner Ulrich. “Non-Binary Error Correction Codes”. In: Bell System Technical Journal 46.6 (1957), pp. 1341–1388.

[45] G Viswanath and B Sundar Rajan. “Matrix characterization of linear codes with arbitrary Hamming weight hierarchy”. In: and its applications 412.2 (2006), pp. 396–407. 111

Index

푃 -caconical form, 76 decoder, 18 푃 -decomposition complete, 21 maximal, 61 deterministic, 23 profile, 62 full decoder, 18 refinement, 60 Maximum a Posteriori Probability, 21, 푃 -distance, 45 23 푃 -weight, 44 Maximum Likelihood, 23 Minimum Distance, 24 Bayes coding-decoding scheme, 89 optimal, 19 Bayes decoder, 89 decomposition Bayes encoder, 89 푃 -irreducible, 59 Bayes scheme, 86 discrete channel, 16 Binary Erasure Channel, 16 Binary Symmetric Channel, 16 error probability, 19 error value function, 85, 86 canonical-systematic form, 56 expected loss, 86 channel block channel, 16 generalized Plotkin’s construction, 99 memoryless, 16 generalized reduced row echelon form, 74 code, 17 generator matrix, 35 푃 -decomposition, 59 Hamming Code, 35 decomposition, 58 Hamming Distance, 24 dual, 36 Hasse diagram, 41 equivalent, 39 hierarchical poset, 42 linear, 34 rate, 36 information set, 17 self-dual, 36 isomorphism of posets, 43 Combinatorial Metrics, 26 level enumerator, 42 complexity of 푃 -decomposition, 63 112

Leveled Syndrome Decoding, 81 Stochastic Map, 14 lexicographic encoder, 95 subposet, 43 linear isometry, 38 support, 26 syndrome, 28 matched, 31 Syndrome Decoding, 28 maximal decomposition, 58 metric, 14 weight, 25 invariant by translation, 25 invariant, 28 minimum distance, 35 preserve support, 26 natural labelling, 44 norm, 26 order anti-chain, 41 chain, 41 ideal, 41 total, 41 order preserving map, 43 ordinal sum, 48 packing radius, 36, 47 parity check matrix, 28, 35 partial order relation, 40 partition, 56 aggregate, 57 refinement, 57 split, 57 pointed partition, 56 primary 푃 -decomposition, 63 reduced row echelon form, 72 refusal probability, 19 semimetric, 27 standard form, 40