AN OVERVIEW OF PUBLIC INFRASTRUCTURE

A Thesis

Presented to the

Faculty of San Diego State University

In Partial Fulfillment of the Requirements for the Degree

Master of Science in Applied Mathematics

with a Concentration in

Mathematical Theory of Communication Systems

by

Pavan Kandepet

Fall 2013

iii

Copyright c 2013 by Purushothaman Pavan Kandepet iv

DEDICATION

I would like to dedicate this thesis to my chair Dr. Carmelo Interlando for helping me extensively through the course of my study at San Diego State University. His valuable advice, insight and knowledge have helped me tremendously. I thank him for all the help he has provided. v

ABSTRACT OF THE THESIS

An Overview of Public Key Infrastructure by Pavan Kandepet Master of Science in Applied Mathematics with a Concentration in Mathematical Theory of Communication Systems San Diego State University, 2013

Security for electronic commerce has become increasingly demanding in recent years owing to its widespread adoption across geographically distributed systems. Public Key Infrastructure (PKI) is a relatively new technology with foundations in mathematics and which provides the necessary security features for digital commerce. The main goal of this work is to provide an introduction to PKI and how it can be used across geographically distributed systems. We start with an introduction to electronic commerce security and discuss its security concerns. This is followed by an introduction to cryptography, which sets the stage for the main chapter on PKI which introduces several components of this system in detail and its expectations. Next, certificates and certificate management, which are key components of electronic security, are discussed. The work is concluded with real world applications of PKI, its restrictions and problems, and its future. vi

TABLE OF CONTENTS PAGE ABSTRACT ...... v LIST OF FIGURES ...... viii ACKNOWLEDGMENTS ...... ix CHAPTER 1 INTRODUCTION ...... 1 1.1 Motivation...... 1 1.2 OutlineofThesis...... 2 2 BACKGROUND ON CRYPTOGRAPHY ...... 3 2.1 TheBasics...... 3 2.1.1 Symmetric or Secret Key Ciphers ...... 4 2.1.2 Asymmetric (Public Key) Ciphers...... 5 2.2 DigitalSignatures ...... 7 2.2.1 Hash Functions ...... 8 3 SECURITY INFRASTRUCTURE...... 10 3.1 Introduction...... 10 3.1.1 Secure Single Sign-on...... 10 3.1.2 End user transparency ...... 11 3.1.3 Comprehensive security...... 12 4 PUBLIC KEY INFRASTRUCTURE...... 13 4.1 Basics...... 13 4.2 Core PKIServices...... 13 4.2.1 Authentication ...... 14 4.2.2 Integrity ...... 15 4.2.3 Confidentiality ...... 17 4.3 Services Offered byPKI...... 18 4.4 Certificates...... 19 4.4.1 Certificate Initialization ...... 22 4.4.2 Certificate Issuance...... 23 vii 4.4.3 Certificate Cancellation ...... 23 4.4.4 Certificate Distribution ...... 23 4.4.5 Certificate Trust Models...... 24 5 PRACTICAL PUBLIC KEY INFRASTRUCTURE SYSTEMS ...... 27 5.1 Essential Components...... 27 5.2 Roles and Responsibilities...... 28 6 CHALLENGES AND THE FUTURE OF PUBLIC KEY INFRASTRUCTURE.. 29 BIBLIOGRAPHY...... 31 viii

LIST OF FIGURES PAGE Figure 2.1. Cryptographic System...... 3 Figure 2.2. Symmetric Cipher System...... 4 Figure 2.3. Public and private key encryption/decryption system...... 6 Figure 2.4. ...... 7 Figure 2.5. Hashing messages...... 8 Figure 3.1. Single sign-on system...... 11 Figure 4.1. Remote Authentication...... 15 Figure 4.2. MAC process...... 17 Figure 4.3. X.509 Version 3 Certificate...... 21 Figure 4.4. Strict Hierarchical trust model...... 25 Figure 4.5. Distributed Hierarchical trust model...... 26 ix

ACKNOWLEDGMENTS I would like to thank Dr. William Root and Dr. Peter Blomgren for taking the time out of their busy schedules to be on my committee and for their valuable comments. 1

CHAPTER 1 INTRODUCTION

Over 90% of website use today is to generate revenue through electronic commerce. Web security is critical to business and end users alike. Electronic commerce is a major driving force of the economy today. It has improved our lives significantly. Imagine if you had to walk to the bank everyday to check your account balance. Checking it online is so much more convenient. Think what would happen to Amazon or Apple if there were no electronic commerce. Even though electronic commerce has improved our economy significantly it also comes with some significant issues. What happens if electronic commerce transactions are compromised? How are these transactions secured? When a typical user performs an online transaction such as buying something online, there are usually multiple hops (computer nodes) between which the transaction passes through. A complex path between different countries and over various networks will likely carry this personal and sensitive information. The end user who initiated the transaction has no control over the communication path due to the nature of connected networks. How secure are these connections? How can we trust the website at the other end is what it claims? If an impostor in the middle seems to show a fake website claiming that it is the actual website, how can we find that out?

1.1 MOTIVATION One of the major factors stopping the success of electronic commerce for both business and customers alike is trust. Trust is paramount when money is involved. Many aspects of this trust can be satisfied using cryptographic techniques. If a business organization spans branches across multiple countries a medium of trust is necessary for secure communication among necessary servers, clients etc. Using cryptographic strong techniques and Pubic Key Infrastructure (PKI), this trust can be established. Electronic commerce can basically be of two types, business to business transactions and consumer to business transactions [10]. Banks were the pioneers in business electronic commerce early on due to the inherent nature of their work. Managing huge transactions both domestic and international requires secure communication techniques. This was due to the nature of such businesses to cut down on communication costs, facilitate ordering, negotiations, pricing, invoicing and payment processes. 2 The basic electronic commerce model is built on the observation that the most elemental building block is a transaction. It always involves two parties, a customer and a seller. The customer begins with a request where it requests the seller for information. The buyer needs to be absolutely sure that the nature or content of information that the seller is providing has not been compromised. The seller must also make sure that their offer remains confidential to the buyer. What is fundamentally different from traditional commerce in electronic commerce is the absence of human interaction for trust. The machines have no reliable way of knowing who is really on the other end of the line to confirm the identity. This is where the role of authenticity comes into play [10]. Electronic commerce needs confidentiality, integrity, availability and authenticity to be successful. PKI provides both availability and authenticity while cryptography provides confidentiality and integrity.

1.2 OUTLINE OF THESIS This expository work introduces the concept of Public Key Infrastructure (PKI) as a necessary component of modern electronic commerce and how it provides security and infrastructure services. Public key infrastructure is a relatively new technology that utilizes strong cryptographic foundations for providing security and infrastructure services to businesses. This thesis introduces and discusses these ideas and concepts along with an introduction to the core concepts of security, cryptography, certificates, key management and trust models. Chapter 2 provides the foundations of cryptography by introducing security models and the challenges it overcomes. We then introduce symmetric and asymmetric ciphers. This is followed by digital signatures and hash functions all of which are used extensively in public key infrastructure. Chapter 3 lays the groundwork for public key infrastructure. An introduction to the expectations of public key infrastructure and security infrastructure models is discussed. Chapter 4 discusses the core services that public key infrastructure provides including authentication, data integrity and confidentiality. A discussion of what certificates are, what problems they try to solve, how to distribute them and what certificate revocation entails is discussed. Finally, the last two chapters discuss the challenges and future of public key infrastructure. 3

CHAPTER 2 BACKGROUND ON CRYPTOGRAPHY

This chapter discusses the fundamentals of cryptography and touches on those concepts relevant to the remaining chapters.

2.1 THE BASICS Cryptography is the study of the mathematical techniques related to aspects of information security such as confidentiality, data integrity, authentication and data origin authentication. The basic idea of cryptography is to transmit information over an insecure channel and ensure that no entity in the middle can understand what is being transmitted. The history of cryptography goes back all the way to the Egyptians about 4000 years ago. It was also used extensively during the world wars. [8] As an illustration we can consider two people, say Alice and Bob, who wish to transmit information back and forth in such a way that a third person Oscar who is an opponent and can listen to the transmission but cannot decipher the information being transmitted. The information Alice intends to send is called the plaintext since it is not encrypted yet. Alice encrypts the plaintext with a predetermined key called the cipherkey and transmits the ciphertext over the insecure channel to Bob. Bob knowing the cipherkey can decipher the ciphertext to obtain the plaintext. Oscar, who does not have the cipherkey, cannot decode the ciphertext. Figure 2.1 illustrates this.

Figure 2.1. Cryptographic System. 4

Definition 2.1. A cryptosystem is a five tuple (P,C,K,E,D), where the following conditions are satisfied: • P is the finite set of possible plaintexts. • C is the finite set of possible ciphertexts. • K, the keyspace, is a finite set of possible ciphertexts.

• For each kǫK, there is an encryption rule ek ǫ E and a corresponding decryption rule dk ǫ D. Each ek : P → C and dk : C → P are functions such that dk(ek(x)) = x for every plaintext element xǫP . There are two mechanisms to perform the transformation of the plaintext to ciphertext and vice versa, symmetric (secret key) and asymmetric (public key) [2]. We provide a brief description of each technique below.

2.1.1 Symmetric or Secret Key Ciphers In this technique, the sender and receiver both share some sort of secret information which decides how the information is transformed from plaintext to ciphertext and vice versa. A typical example would be to replace every character by another character five places behind in the English alphabet [2]. If we reach the beginning of the alphabet while shifting, we just wrap around how much ever is required. This action transforms the plaintext into ciphertext. To decrypt the ciphertext on the recipients end, we just replace every character in the English alphabet by a character five places ahead. This action transforms the ciphertext into plaintext. Alice and Bob can secretly choose the key k which gives rise to both the keys used for encryption ek and decryption dk. The encryption and the decryption keys are related and one can be easily derived from other [2]. In many cases ek = dk. Figure 2.2 illustrates this.

Figure 2.2. Symmetric Cipher System. 5 Symmetric ciphers may seem easy and straightforward to use, but they have their own problems. The need for a secret key exchange, difficulties initiating secure communication between unknown entities and difficulties with scale where Alice needs to maintain approximately n − 1 keys for every n users etc are some issues faced by symmetric ciphers [2]. How can the secret keys be exchanged between two entities spanned across geographic locations? How is a trust medium established in this case where the intended recipient might be previously unknown to the sender? Scaling also becomes a problem here due to secretive nature of the encryption keys. Imagine if the same key is being used for all encryption across different recipients and it is somehow discovered by an eavesdropper [12].

2.1.2 Asymmetric (Public Key) Ciphers As discussed in the previous section, private key cryptography leads to some issues that cannot be solved easily. One solution is to use a public key cryptographic system where it is computationally infeasible to determine the decryption key dk given the encryption key ek [15]. The advantage of using a public key system is that Bob is free to publish the encryption key to the public without concern that someone can find the decryption keys using it. It is computationally infeasible to find dk using ek. Alice can use the encryption key ek to encrypt

any plaintext message and only Bob who has dk can decrypt and read the message. Alice does

not even care about dk since it is in no way related to ek unlike in private key cryptosystems.

Here ek is called the public key and dk is called the private key and together they are called a key pair. Public key cryptosystems were introduced to the world through a groundbreaking paper in 1976 by Diffie and Hellman [4].

Finding dk from ek requires solving a computational number-theory problem such as factoring a three hundred decimal integer or calculating discrete logarithms in large groups

[5]. Theoretically, it might be possible to compute dk from ek but the time and effort (temporal and spatial) it takes to find the other key makes it prohibitively high [2]. One of the main reasons private key cryptography was deemed inefficient was the difficulty in enabling secure communications between sender and the recipient. In public key

cryptography, ek can be distributed freely, much like publishing your telephone number in a

public directory. So basically, even if Alice does not know Bob, she can look up Bob’s ek in a directory, encrypt the information and send it without concern that an eavesdropper will decipher the information being sent.

The main caveat in public key cryptography is that Alice trusts the key ek which she obtains in a repository. How can she trust that the key she is retrieving to encrypt the

information for Bob is genuine?. She must find a way to prove that Bob’s key ek is what it 6 claims to be. The key itself must be independently verifiable. This can be achieved through what is called as a public key certificate. Public key infrastructure provides a basis for techniques for certificate management, repository management, etc , and is the basis for the remaining chapters.

Even though it is computationally infeasible to obtain dk from ek, the amount of computational power it takes to encrypt all the plaintext using ek is significant. Hence, a better and common approach is to use a symmetric key to encrypt the plaintext and then use an asymmetric key (public key) to encrypt the symmetric key. When the recipient receives the information, he decrypts the symmetric key using his private key and decrypts the actual data using the symmetric key.

Figure 2.3. Public and private key encryption/decryption system. 7 There are many advantages and disadvantages of symmetric key cryptography and that of public key cryptography. Symmetric key cryptography can be usually designed with very high data throughput using hardware. The keys are relatively short and can be used to produce strong ciphers. They also have some noticeable shortcomings. The keys must be known to both parties in a two way communication system. Large networks need to have many key pairs that have to be maintained. Keys might have to be changed on a frequent basis. Digital signature techniques using symmetric keys require rather large keys for public verification [8]. Public key cryptography has some noticeable advantages. Only the private key needs to be kept secret. Depending on the mode of usage, they public and private key pair remain unchanged for longer periods of time and they also lead to very efficient digital signature mechanisms. Some drawbacks include very low throughput rates compared to symmetric key cryptography and considerably longer key sizes [8]. Modern cryptographic systems use a combination of both symmetric and public key cryptography. Public key systems are used for digital signatures and key management while symmetric key systems are used for efficient encryption and decryption.

2.2 DIGITAL SIGNATURES A digital signature lets an end user verify the authenticity of the information origin and ensure that the information was not altered in transit. They are used to provide authenticity and integrity. It uses the private key to create the signature and the public key to verify it. If the information can be decrypted using the public key, then it must have originated from the sender. It relies on a key pair of encryption and decryption keys. We can think of a digital signature as a private operation on data resulting in a signature. Since only Alice knows the private key used in the signing process, that signature is unique. Any entity can verify the digital signature by looking up Alice’s public key [2].

Figure 2.4. Digital signature. 8 The private key signing process is mathematical and always provides a unique and fixed size output. No two different data sets can produce the same signature. This process also uses a cryptographic hash function. Signing basically uses a hash function on the data to produce a fixed size value which is then used in the private key operation. The verification process involves the reverse process, verify the hash on the data by comparison and then subject that value to a private key operation. If the signature matches the key and the hash value, the operation is considered successful, otherwise the message is rejected [2].

2.2.1 Hash Functions The digital signature generation process can be quite time consuming and produces an enormous amount of data. This is due to the fact that there is no compression of data. This scheme can be improved by using a hash function which can produce a unique fixed length output even for very large input data. It is a computationally efficient technique for mapping arbitrary binary strings of arbitrary length to binary strings of fixed length [8]. For any good hash function a desirable property is that the probability that any randomly chosen string maps to a particular n bit hash value is 2−n. For a hash function h the chance of collision for two inputs x and y such that h(x)= h(y) should be computationally infeasible and given a specific hash value y, it should be computationally infeasible to find an input x such that h(x)= y [8]. This output of a hash function h is known as a message digest. Even if a single bit of information is changed in the original information, the resulting message digest is different.

Figure 2.5. Hashing messages.

For digital signatures, a message digest of the original message is first computed and then signed and transmitted along with the message. The party receiving the message then computes the message digest for this arrived message and compares the message digest with the original message digest received. If there is a difference, then the original message has been tampered during transmission. 9 We can use the message digest and the private key to create a signature. This is the scheme used in systems such as (PGP) [6]. If a secure hash function is used, it is impossible to get the same signature from another source or from a function other than the original one being used. Commonly used hash functions are public and involve no secret keys. Common examples of hash functions include MD4, MD5, SHA-1 and RIPEMD-160. These are examples of what are called as Modification Detection Codes (MDC). The other category of functions used for data origin authentication as well as data integrity involve the use of secret keys and are called Message Authentication Codes (MAC) [7]. 10

CHAPTER 3 SECURITYINFRASTRUCTURE

This chapter lays the groundwork for Public Key Infrastructure (PKI). It discusses the expectations from PKI and introduces the relevant ideas and concepts used in the remaining chapters.

3.1 INTRODUCTION Consider the definition of an infrastructure: It is a basic building block necessary for a larger system to operate. In other terms, it is a facilitator of necessary services to an organization. An example in computing would be a Local Area Network (LAN). A LAN provides a consistent physical network with availability of routers, switches, hubs, gateways, connectivity ports etc.. to facilitate easy transfer of information back and forth from an external network. Typically a user can connect to the LAN via a wireless network or using a provided ethernet cable and be able to access local (intranet) or external (internet) information. Here a LAN can be considered an infrastructure because it provides services. An infrastructure is sensible as it avoids non-interoperable solutions, thereby preventing expandability headaches for new users. For example, if every company had their own protocols for data exchange, there would be problems for the end user because of interoperability. A security infrastructure provides security underpinning for the entire organization and must be accessible by all applications and objects in the organization that need security [2]. The infrastructure is meant to function as an application enabler [2]. A standard or a generic interface makes the system more accessible. In the case of a security infrastructure, it enables applications to add security to their own data [2]. One of the main requirements of an infrastructure is that the device or system that is making use of this infrastructure can be agnostic of how the infrastructure is providing the resources. A security infrastructure as the name states, provides security through the infrastructure. Three important services expected from a security infrastructure are secure single sign-on, end user transparency, and comprehensive security. These are explained next.

3.1.1 Secure Single Sign-on A user typically accesses a system using a login page where they typically enter a login name and a password. Other options would be to use a smart card reader which has their 11 credentials stored. This approach has its pros and cons. A user has to enter this information each time they need to access a machine or have a different credential for each new machine etc.. Also there are numerous problems where passwords expire, bad passwords that can be easily cracked etc. A secure sign-on method where the user typically needs to enter this information only once and the system transparently authenticates and uses this information everywhere else is immensely useful. Cryptographically strong methods can be used to ensure that the authenticated information travels securely along a network. A single successful sign-on event can be extended to multiple applications or systems using the security infrastructure. This totally eliminates the need for multiple logins or passwords and makes the life of a end user easier. Imagine the difficulty a user has when they have to remember multiple passwords for different systems. Single sign-on is an integral concept of a security infrastructure. A single sign-on system has an inbuilt know how of the person’s identity so that it can compare an attempted sign-on with what it expects from a legitimate user [2]. Multiple applications in an organization can all use the same single signon service, thus making it more convenient on the application and the user. Figure 3.1 illustrates this better.

Figure 3.1. Single sign-on system.

3.1.2 End user transparency When a user logs on to a system using a LAN for example, they should not be concerned about how the information is being passed through the network. They should not care about TCP/IP protocols, header information, etc. All they care about is logging into the system and accessing the information that they need. Similarly, an infrastructure should be as transparent as possible. In fact 99.9% of the people do not really care about the internals of an infrastructure. Similarly the security of the infrastructure should be as hidden as possible from 12 the end user. Security should not be a burden to the user, in fact it should be a facilitator that enables them to get their job done. This is the expectation from a security infrastructure. Other than the single sign-on event, the infrastructure should perform all security related tasks in a way that is completely transparent to the user [2].

3.1.3 Comprehensive security Comprehensive security means that a uniform security solution using strong cryptographic solutions is available throughout the infrastructure. This security solution is transparent and seamless. It should not be available in different security levels for an end user. With a single sign-on the same high level security expectation is available throughout the system. Different applications throughout the system all have the same security available. Unification of security is the important concern here. One way of achieving this is to use keys which are processed in a consistent manner throughout the system [2]. This provides uniformity everywhere in the system that is comprehensive. It ensures that a single and trusted technology such as public key infrastructure is available throughout the system and not at one location or setting. Every component of the security system works seamlessly. Applications such as databases, file systems, firewalls, security and network services such as file transfer protocols, remote logins etc. can all make use of this seamless feature. It simplifies both the end user and administrators managing the system. An infrastructure can make a business more efficient. A single sign-on solution throughout the system can reduce costs with perceived benefits. A single solution makes adding new users easier. Interoperability between multiple networks is much easier. 13

CHAPTER 4 PUBLICKEYINFRASTRUCTURE

Having gained an understanding of the security requirements of an infrastructure in the previous chapter, this chapter delves into the details of the Public Key Infrastructure (PKI) system. We first begin with a discussion of what PKI is and what services are offered by PKI. This is followed by a comprehensive discussion of certificates which are the backbone of PKI, which in turn is followed by the life cycle of certificates, and trust models in PKI.

4.1 BASICS In its most basic form, Public Key Infrastructure is a comprehensive system for generating, managing and distributing keys used in public key cryptography. We can think of it as a supporting technology which enables secure electronic commerce. It offers fundamental values to an organization including substantial cost savings. It forms the underlying technology to support common security features such as authentication, integrity, non-repudiation and confidentiality. These features are enabled using a combination of symmetric and asymmetric cryptography. The whole idea of a comprehensive Public Key Infrastructure system is to provide a single and easily managed infrastructure instead of multiple security solutions. A well designed PKI system provides the following benefits:

• Reduce maintenance overhead of multiple security solutions compared to a single point of administration of PKI. • Reduce security complexity for end users by providing a single password or passphrase that transparently works across all other applications in the infrastructure rather than unique and error prone multiple security solutions. • Optimize work flow and productivity. • Reduce end user security requirements.

Typical organizations who might find PKI useful include financial institutions such as banks and investment companies to name a few. Numerous vendors offer PKI services. Examples include Entrust, RSA Security and Verisign. 14

4.2 CORE PKI SERVICES Public Key Infrastructure (PKI) is a fundamental component of a pervasive security infrastructure. The functionality of the infrastructure is achieved by use of public key cryptography and its related components. A comprehensive PKI system offers three core services, namely Authentication, integrity and confidentiality which we discuss next.

4.2.1 Authentication Authentication is related to trust. One entity authenticates another entity and this authentication is used for both entity identification and for data origin authentication [2]. As the name suggests, entity identification is used to authenticate an identity, someone or something who claims to be unique. This is a crucial first step in the whole security model. The identity itself, once authenticated may be associated with a set of privileges on an Access Control List for the making other decisions in the system [2]. Consider a person logging into a system, this is an example of entity identification. The process of entity identification can be performed either remotely or locally. A person can log into a machine from a remote work site, this is an example of remote authentication. Examples of a local authentication include pin entry or thumbnail scan. Remote authentication includes such things as Virtual Private Networks (VPN), secure shell logins and such. Local authentication is preferable over other authentication methods because of the sensitivity of the login credentials being passed. For a secure environment, credentials cannot be passed over insecure lines because of concerns over snooping etc.. Once an entity is identified, its credentials can be used transparently throughout the secure infrastructure system. Similar to a user entering a password at their login screen, authentication is the first step among many in a secure system. There are numerous authentication techniques each with its own strengths and weaknesses. Entity identification can be either single or multifactor authentication. In single identification, the user only uses one type of authentication on the system. Typical examples would be a pin or a password. In multifactor authentication there might two or more stages for a user to authenticate themselves to the system. Usually techniques such as pin or password followed by a fingerprint, retinal scan or a passphrase are used. Multifactor is obviously much more impervious to man in the middle attacks at the expense of user frustration. Data origin authentication as the name suggests is used to verify authenticity of the origin of a specific piece of data. For example, a data that claims to be from a specific location must be authenticated. It is used to tie the origin information for some data such that it cannot 15 be revoked. For example, if a backup of a system is made, we can add a signature such that any change after the fact will yield a different verification signature. Local authentication does not use the services of PKI, however remote authentication can use PKI by transmitting proof of local authentication. Common techniques for remote authentication involves challenge-response protocols and signed messages. As an example, Alice can send a simple challenge message to Bob. Since only Bob knows his private keys, he can sign the challenge message and send it back to Alice. Now since Alice wants to know the authenticity of Bobs keys, she can check that information using Bob’s public key and get the original challenge message back. This way a sort of remote authentication is established. This operation is illustrated by 4.1.

Figure 4.1. Remote Authentication.

A possible use of PKI in authentication involves the user signing on locally and gaining access to their private keys. These private keys can then be used to authenticate the user automatically and transparently to other entities in the network when a connection is necessary. This is transparent to the user and only involves a single sign on the system initially [2]. The main idea of authentication as a PKI service is to understand that once someone/something is authenticated, that security information for that user or data is transmitted transparently throughout the system. The user or data does not need to continuously authenticate themselves over and over again. The cryptographically secure information is transparently used in the secure network. This is the expectation of a PKI enabled system. It is important to observe that no data that belongs to a person, such as the password or pin used can be used to generate the public/private key pair used to identify the user to that system. 16 4.2.2 Integrity Integrity here refers to digital data integrity. This is to ensure that any data received has not been altered during transmission. Typical integrity measures in the digital world uses Cyclic Redundancy Checks (CRC) as one technique. CRC’s are not cryptographic in their approach. Common operations that disrupt data integrity include insertion of bit changing the original data length, deletion, re-ordering, inversion, noise introduction, substitution etc.. PKI by default provides data origin authentication which by default includes data integrity checking. A second technique that can be used for integrity checks is Message Authentication Code (MAC). Since MAC’s are commonplace in PKI a definition is useful: The purpose of MAC is to facilitate without the use of any additional mechanisms assurances regarding the integrity of the message from a source. MAC’s are an example of keyed hash functions. They are a family of functions hk that are parameterized by a secret key k with the following properties [8]:

• Ease of computation: For a given function hk, given k and input x, hk(x) is easily computable. The result is called the MAC for a given input x.

• hk can map an input x of arbitrary bit length to an output hk(x) of fixed bitlength n.

• It is computationally infeasible to compute any text-MAC pair for any new input x =6 xi

The process of how a MAC is generated is illustrated in figure 4.2. The source of the message computes the MAC on the message using a secret MAC key that is shared with the destination. The source sends both the input message and the computed MAC. The recipient determines the source identity, separates the received MAC and independently computes the MAC and verifies it. If there is a disparity in the MAC, it indicates that the message was modified and should not be trusted. Many types of attacks are possible on MAC’s including attacks on the key space and bitsize. A good MAC should be impervious to these types of attacks. Other more commonly used technique include hash algorithms and digital signatures. Even a change in a single bit of the data causes the signature to be altered significantly and results in a failed integrity check. PKI can be used to verify the integrity of the system, a typical use would be as follows where Alice wishes to send information to Bob.[2]

• Alice generates a previously unused symmetric key. • Use this symmetric key to generate a Message Authentication Code. • Use Bob’s public key to encrypt the symmetric key. 17

Figure 4.2. MAC process.

• Send the data to Bob who can use his private keys to decipher the data.

Data origin authentication also forms a core part of integrity. It ensures that the data belongs to a reputed source. Some common techniques for data origin authentication are digital signature techniques and appending secret authentication codes to a message being sent.

4.2.3 Confidentiality Confidentiality is used to assure privacy of data [2]. Privacy of data is ensured by encrypting data and securing its encryption keys. This is the topic of key establishment and key management. Key establishment provides shared secrets between two or more parties and key management involves distribution of those keys. Key management is the main topic of this section. Key establishment involves an authentication protocol, key establishment protocol and authenticated key establishment protocol. The authentication protocol as the name suggests is used for authentication between two parties for identification. Key establishment protocol and authenticated key establishment protocol is used for establishing the secret keys once the 18 identity has been established. Another important topic is session keys. As the name suggests these are used per session and discarded once the session is complete. These ensure security because they are different each time, different among sessions and they give more security due to limited exposure of the cipher text [8]. There exist many key transport protocols. Some of the more common ones based on symmetric encryption are [11]:

• Point to point key updates. • Shamirs no key protocol. • Kerberos (from MIT). • Needham Schroeder shared key. • Otway-Rees.

Some key establishment protocols involve the use of a central server while some do not use a central server for security purposes. Another very common asymmetric key agreement protocol is the Diffie-Hellman key agreement protocol [14]. It provides unauthenticated key agreement.

4.3 SERVICES OFFERED BY PKI Numerous security services in a comprehensive infrastructure can be enabled by using PKI. Some of the features include communicating securely, secure time stamps, notarization, non-repudiation and privacy [2]. These are discussed in following sections. Since PKI uses many of the core features of cryptography, secure communication is one of the main features offered by PKI. Secure communication is basically transmission of important data from one entity to another under adversarial scenarios. This is where the core PKI features such as authenticity, integrity and confidentiality come into picture. This security feature is a fundamental component of any secure infrastructure where other network communication features such as email, browsing, virtual private network and other services are offered [9]. Secure timestamps are used to guarantee uniqueness down to the micro second or millisecond providing timeliness for a piece of data being transmitted over a network. This can be used to guarantee that the data has not been subjected to unnecessary delays over a network which might indicate eavesdropping etc. An example of a timestamp verification algorithm can be as follows:

• A initial timestamp is obtained from the clock of the host machine (to whatever accuracy is necessary). 19 • This timestamp is attached to the outgoing message. • The receiving entity upon getting this message calculates the difference between its time and the time the message was originally sent. • If the difference is above or less than a preset or acceptable threshold the message is invalidated.

Obviously, one of the most important features of the secure timestamp feature is that all clocks are synchronized to the same time using a GPS clock or any other precise time system. Notarization in a sense is similar to what a notary does in a legal framework. They act as a legal authority to perform or certify some documents etc. In a PKI system, the notary certifies that a specific data is correct. The data being certified could be a digital signature over some hashed value for example. The notary service offered by a PKI system is trusted by other entities to verify the correctness of data. This service relies on the PKI service of authentication and secure timestamping. Non repudiation is a terminology used in PKI to ensure that entities cannot deny that certain actions were not performed by them. Examples includes receipt of a message, delivery of a message, origination of a message from a particular entity etc. To state a simple example, if Bob sends a digitally signed message to Alice after claiming the he received a message from her, he cannot deny without admitting that he knowingly gave his signing private keys to a third party to allow repudiating the message receipt. That his private keys were compromised [2]. It must be understood that for any cryptographic system, the private keys are kept secure and private and public keys are freely available. Privilege management or rights management is a loosely used term to indicate what rights a user has with respect to using a secure system. This is similar to a user accessing a computer system, if he/she is not an administrator of that system then their access to certain sections of that system is limited. A guest user on a system cannot be expected to make changes to administrator sections of that system. PKI can use be used to offer services depending on the access restrictions placed on a user. There exists a subtle difference between authorization and authentication. Authentication is associated with identity of an entity, it is used to claim the identity of a person. Authorization is associated with what that person can and cannot do. Both are related in a PKI system. Public key authentication and single sign on can be leveraged on a system where PKI is available. With this introduction to core PKI services a section on certificates is discussed. 20

4.4 CERTIFICATES Consider your passport number or your social security number, it is a form of identity issued by the government to identify an individual. In a digital world something similar can be accomplished to identify an individual or an organization by using a certificate. Certificates are a core component of the PKI system and are used to publish information about an entity. Basically, they publish information about public key values. One of the ideal features of a PKI system is that the public information about an entity/person is publicly available. However, making this information public has its perils. It should be protected and its integrity must be guaranteed from tampering. These two core features, data integrity and ownership are guaranteed by a certificate. A certificate is used to bind the public key associated with an entity’s name. A public key is associated with an individual by using a certificate issued by a trusting authority. Alice can send a message to Bob using his public key issued by an authority. A authority that is trusted by a majority of the population is called Certificate Authority (CA) [2]. A certificate also known as a Public Key Certificate is a entity that identifies the identity of the individual and a corresponding public key [2]. The certificate authority performs the function or operation of binding a public key pair to a identity. A online repository where numerous amounts of freely available public key certificates can be stored is called as a Certificate Repository. It is something that can be easily found by those who need it. It is maintained by the certificate authority. Typical information included in a certificate:

• Information about the user. • Information about the certificate authority which issued this certificate. • Public key associated with the users public/private key pair. • Encryption algorithms used in the certificate. • Validity, revocation status etc.

Whenever information needs to be exchanged using certificates, there needs to be a central repository that hosts this information. This makes it easy for anybody who wishes to use this information. Such an entity that maintains these keys and certificates is called as a Certificate Repository. Just as a passport or a drivers license expires, a certificate has a limited lifetime. Thus, certificates need to be renewed when they expire. A user or an organization must be informed of these changes and appropriate action taken. Certificate Revocation automatically invalidates a certificate when it expires [3]. 21 There cannot be a global identity or a certificate for a user. There usually exist multiple certificates for a user based on different organizations or entities. During the course of time, these certificates will need to be interconnected to identify an individual, this process is achieved by Cross Certification [13]. One of the most common uses for a certificate is used to guarantee some action for a user. These actions typically cannot be reversed and is always tied to their identity. For example someone might sign a document and send it to another individual. This action (the signing) cannot be reversed so that the originator cannot later claim that it did not originate from him. Non repudiation is a property of PKI which is used to prevent denial of such an action. PKI should provide support for avoiding or preventing repudiation [2]. Certificates can be broadly classified into either identity certificates or credential certificates. Identity certificates identify an entity called the certificate subject and lists the public key associated with that entity. A credential certificate describes other information such as permissions or credentials etc.. A number of different types of certificates exist [2]

• X.509 public key certificates • Simple public key infrastructure certificate • Pretty good privacy (PGP) certificate • Attribute certificates

Figure 4.3 describes what a typical X.509 version three certificate looks like [1].

Figure 4.3. X.509 Version 3 Certificate.

Most of the fields are self explanatory while some require further discussion. 22 • Serial Number is a unique identifier for every certificate. • Signature indicates the algorithm identifier used to calculate the digital signature. • Issuer is the Distinguished Name of the certificate authority that issued the certificate. • Subject is the owner. • Extensions are optional and private.

Just as a license or a passport expires after a valid duration a certificate also expires after a while. This information should also be somehow sent to the entity who is requesting this information for sending secure information. This process of invalidating a certificate is called as Certificate Revocation. Also whenever a certificate has new information that needs to be added to it, a process called as key update or certificate update is performed [2]. Since this process is cumbersome each time a certificate is used, it is performed automatically the PKI system. Whenever the validity period is expiring the system automatically checks for a new or updated certificate and if one exists, it is replaced. Asymmetric cryptography is based on the use of public/private key pairs and these are distributed in the form of certificates. The private key is never shared in transit and should always be stored in a secure location. The life cycle of certificates involves creation, distribution, revocation and cancellation. The primary information contained in a certificate is the public key of the entity. This information can expire just as a passport or a a ID expires. Hence a lifecycle for certificates is also necessary. The certificate lifecycle is a mandatory and core necessity of any security system which offers PKI services. An ideal PKI environment would be one where the entire lifecycle of certificates and pretty much everything else is as transparent and user friendly as possible to the end user. The lifecycle of certificates involves initialization, issuance and cancellation.

4.4.1 Certificate Initialization As with any security service or service in general, a setup is necessary before further operations can begin. The first among many steps in the certificate lifecycle is the registration process. Here the entity wishing to use PKI services must first register itself to an authority. The authority here is an another entity which offers services such as certificate authorizations or registration authorizations. This should be a public entity. The registration process usually involves sharing a secret key such as a password or a pin number between the authority and the end user entity. This shared secret can later be used to verify the authenticity of the entity at a later point in time. 23 The second step involves generation of a private-public key pair. For security purposes usually different key pairs are used for different services which the comprehensive PKI system offers. The location of the generation of the key pair is also important. For non security purposes, the key pair can be generated at the client site while for repudiation purposes it must be generated at a trusted source such as the certificate authority or the registration authority. The private key is always one of the most important part of a key pair and generating it at the client location has its advantages and disadvantages. Once the public key is generated, it is the responsibility of the certificate authority to add that key to a certificate. The role of a certificate authority is to house these public keys and deliver it to requested parties to verify the authenticity of a an end user. Other factors to consider include how the certificates are transferred securely to other parties, on different networks etc. The delivery of certificates to interested parties is also time sensitive. When browsing or sending email, the public keys used must be readily available to protect the information.

4.4.2 Certificate Issuance Once the certificates have been issued, the next step is retrieval and validation. Usually there are multiple repositories hosting the same certificates as a backup mechanism in case retrieval from a primary location fails. A robust housekeeping technique usually involves keeping copies of certificates as a backup mechanism. Retrieval as the name suggests is the process of requesting a certificate from a authority and validation is done using an integrity check such as a CRC or the message authentication code technique. Once the integrity is validated, the certificate can be considered genuine. Validity also involves checking other aspects of a certificate such as the validity period, if the certificate has been revoked etc.

4.4.3 Certificate Cancellation Just as ID’s and other forms of personal identification mechanisms have a validity period, so do certificates. Certificate cancellation is the process of canceling a certificate. A natural way is when the validity expires. Other times a certificate is canceled could be when its integrity/security is compromised and new one needs to be issued.

4.4.4 Certificate Distribution Before certificates can be used to establish trust between entities, the certificates must be distributed among networks first. This sections described a few techniques to achieve this. Some common techniques involve [8] 24 • Point to point delivery. • Direct access using an online server. • Using systems which implicitly guarantee authenticity of public information. Point to point delivery as the name suggests is more like an hand to hand delivery of items. It is not very reliable and only works for very small or limited systems. This is usually also performed over a reliable and secure channel. Since the integrity of the data is provided by using a hash, the data can be considered secure once it is verified. Email distribution can also be considered as an example of point to point delivery mechanism. Some common examples which are modeled after this idea is Pretty Good Privacy (PGP) and OpenPGP. Scalability, non automation, and timeliness is a big concern here. Direct access using an online server is a technique where there is a central server that provides these certificates. Upon request with the necessary credentials the public key of that entity if available is delivered. This technique can be achieved over a secure or an unsecure channel since the integrity of data is always available in a certificate. The advantages of this system is security, reliability and availability. The drawbacks of this method is that the server must be always online and bandwidth issues if numerous requests are made simultaneously. This is the most common approach used for large enterprises and businesses. There are numerous techniques for accessing this information. Some common ones include Lightweight Directory Access Protocol (LDAP) [16] and File Transfer Protocol (FTP). Using systems which implicitly guarantee authenticity: Here most information available is already cryptographically secure and modification results in detection.

4.4.5 Certificate Trust Models To understand how certificates are used in PKI consider an example where Alice wishes to communicate with Bob securely. In order for Alice to send information to Bob, she must have Bob’s public encryption keys. This is achieved using a certificate authority (CA) which was discussed earlier. The certificate authority will issue a identity certificate which Alice trusts to be correct. This identity certificate will contain Bob’s public signing key which Alice can use to send secure information. Trust models address questions such as which certificates can be trusted, how that trust is established and under what circumstances this trust can be limited or controlled [2]. The definition of trust can be somewhat debated. But, according to the X.509 specification, An entity can be said to trust a second entity when the first entity makes the assumption that the second entity will behave exactly as the first entity expects. This is something that cannot be quantified or measured and there is always risk involved. 25 The concept of a trust model is crucial to a PKI system since it shows where trust starts and describes its flow. A model can be obtained to identify areas of concern or improvements with this structure. Taking the previous example, Alice can trust Bobs public key when she is convinced that Bobs public key corresponds to a private key that belongs to one and only one entity with the name Bob [2]. Certificate authorities can be configured in multiple ways. Some common configurations are:

• Strict hierarchical configuration. • Policy based hierarchies. • Distributed hierarchies. • Mesh configuration. • Hub and spoke configuration. • Web hierarchies.

Figure 4.4. Strict Hierarchical trust model.

A strict hierarchical configurations is laid out like a file system on a computer. There is always a root entity (much like a root ’/’ directory in UNIX) and this indicates the start of the trust model. The public key of the root certificate authority is held by all other intermediate nodes as well as the end entities. We can view this model as a inverted tree model. The root entity defines zero or more entities below it and every other entity has a copy of the public key of its leaf node. The trust relationship in this model flows downwards. If Alice wants to get the public key of the entity in a leaf node of the tree, she starts with the root node and subsequently get the public key of every other tree in that branch to reach the final node. The 26 root certificate authority always certifies its subsequent only. In a strict hierarchy this always flows downwards while in a loose hierarchy this can be both ways, from the lower nodes to its parent node or vice versa. Always the root node has full authority over how its trust is assigned. In the hierarchical model there is usually one root and this is the policy enforced by all its successive nodes. However it is possible to have multiple roots to enforce multiple policies. This is the idea behind policy based hierarchies. In policy based hierarchies, a certificate authority may belong to mode than one hierarchy.

Figure 4.5. Distributed Hierarchical trust model.

In a distributed hierarchical model, the trust is distributed between multiple certificate authorities in the same level. This model is also called the cross certification model [13]. In the mesh configuration each certificate authority can cross certify each other while in a hub and spoke configuration each certificate authority cross certifies with a single central certificate authority whose primary purpose is to facilitate such actions [2]. In the web model, a number of certificates are pre-installed on modern web browsers such as Firefox and Chrome and these act as the initial trust certificate authorities. From there we can have any other trust model. 27

CHAPTER 5 PRACTICAL PUBLIC KEY INFRASTRUCTURESYSTEMS

This chapter discusses overall necessary components to build a practical PKI infrastructure system. We also discuss operational considerations that are necessary to get a robust and concrete system.

5.1 ESSENTIAL COMPONENTS A robust infrastructure system involves many subsystems or components. At the very least they should have a client and a server. A client can be considered at the very least as an example of an end user who wishes to use the security features of a PKI system and the server as an entity which provides this service. There are of course other things such as the combination of hardware and software, manpower, automation and administration etc. to consider. As discussed earlier PKI offers numerous services such as certificates, notarization, non-repudiation certificate issuance and revocation etc. A comprehensive system should offer all these services transparently without much user intervention. In order to facilitate these services a very good software service both at the client and the server is necessary. This software is a combination of operating systems, user libraries, client graphical user interfaces and network protocols etc. A scalable architecture is one where the different subsystems are all laid out across the network. The client side software is only one component of the entire system. Day to day software changes at the user side should not break the PKI system due to inconsistencies. The browser for example which uses the certificate and other components to verify authenticity can always be a different one on each client. Also due to changes in different browsers or other components etc they should employ common libraries that remain consistent across different platforms or systems. Delegation is also an important component of a PKI system. When the client needs to offload processing a certificate, it delegates that work to a trusted entity in the network and awaits the results so it can get other work done [2]. Another aspect of PKI infrastructure to consider is offline operation. We cannot expect a client to be online 24 hours a day. When traveling, there is no network access. Should PKI services be expected in this scenario? This is debatable since information cannot be sent or 28 received anyway. But it is possible that certificate information from a previous transactions be cached, i.e the certificate information be stored locally if being used multiple times so that based on the validity of the certificate it could be used at a later time. The essence of this information is that if some information needs to be sent that depends on a certificate, the information can be stored locally and then sent at a later time. Depending on the usage scenario, this is a possibility. Most operations that require real time access such as secure timestamping, notarization etc. will most likely not be possible without access to online servers. Many hardware components can make life easier for an end user of PKI. Such common operations such as logins can be accomplished by using biometric ids for example. Many other hardware components such as smart cards, biometric ids and even iris scans are commonplace these days. As with any security measure there is always a compromise between security and ease of use.

5.2 ROLES AND RESPONSIBILITIES There are numerous roles and responsibilities for different components of a PKI system. A certificate authority (CA) for example is at the core of certificate distribution and revocation. It must demonstrate reliability and trustworthiness for providing certificates to end users or entities. Security if of paramount importance with immediate revocation notification if something does not seem right. It must ensure that the date and time when certificates are issued can be determined precisely. Immediate and prompt administration of necessary software and hardware must be performed with minimal downtime must be guaranteed. Most importantly, a certificate authority must be trustworthy and disclose its practices and procedures with utmost clarity and transparency. Just like a certificate authority has certain responsibilities, so do the end users. They are obligated to make truthful representations while applying for certificates, control and keep the certificates safe and secure and promptly revoke the certificates when compromised [2]. PKI is an authentication technology where it is used to identify entities for trust. The underlying technology is public key cryptography. It does this by issuing certificates that are used to identify entities and which other entities in turn can trust other based on these certificates. PKI only provides authentication, i.e who is, not what that entity can provide. It must be made clear what PKI does not provide [2]: • Authorization issues • Trust in other end entities. • Unique names for entities. • Creation of security for software and hardware. 29

CHAPTER 6 CHALLENGESANDTHEFUTUREOF PUBLICKEYINFRASTRUCTURE

Adoption of any new technology has its reservations. PKI is no exception. PKI is a relatively new technology compared other security technologies. It is no surprise that adoption of PKI to its fullest extent is still years away. There are numerous reasons why PKI has not been widely deployed as expected. There are a couple of reasons why this might be true [2]. First, there is no single recognized authority for certificate maintenance. Verisign is one of the players which plays a central role as a certificate authority. But unless all entities trust this central resource, adoptions will still not be complete. The internet is mostly based on trust relationships and unless there is a central/global trust repository, adoption is difficult. Within a small company we can have a local certificate authority, but when it comes to the open network, this is not feasible. Second, the infrastructure requirements for PKI in a organization with the right combination of hardware and software is not available. The administrators of PKI systems are not that familiar with the different combinations of a PKI system. Even if the infrastructure is available, it is usually not complete or not fully understood. Different software and hardware need some sort of intercommunication for reliable a PKI to work. Vendors need to interoperate with a common standard such as the X.509 certificate system. Also, for a unfamiliar administrator, the complexity involved to get PKI going is not very straightforward. PKI is only an underlying substrate that many applications or systems can build upon or plug into. But setting this up also needs an infrastructure [2]. Third, there is no motivation among users to use PKI. Consider the case of a financial institution such as a bank. They are based on a series of peer to peer systems that form a chain. The bank is only interested in entities such as a name to get the account number or a card number. A strong authentication of the name is useless to the intermediate players between the account holder and the bank. The intermediate players only need to know the number associated or another entity to approve the step to proceed. When users are used to an older system, there is usually not much incentive to move to a new, unfamiliar and unproven system. Also, the users are not always the only ones who use PKI. It is always some mid-tier system which is involved with certificate movement in the system. For the most part, the main motivation for PKI was the single sign on feature [2], but in reality most users have their website or interests dispersed among different infrastructures 30 where single sign on might not be feasible. In a small infrastructure, single sign on is very easy to achieve since entities are local, but in the open internet where security is paramount, single sign on is not feasible. Having said the above statements, what is the motivation for using PKI in an organization? PKI provides authentication. The binding between a key pair and a name implies that the binding can be used transparently throughout the organization where ever it is needed. It can be used for providing digital signatures, confidentiality and non-repudiation etc. This binding can be used both offline and online. PKI can be used to preserve the security related aspects which are difficult to achieve using symmetric key techniques due to the number of different keys alone that is required. Longevity of public keys is alone a factor which provides strong motivation to use PKI. PKI provides single sign on where an single login to the infrastructure system enables them to use that authentication token or service transparently through the rest of the infrastructure. This is a very beneficial aspect of PKI that makes the remembering multiple passwords or login credentials unnecessary for a user in the system. Productivity of the user increases because of less administrative work involved. The number of keys that need to be maintained is also considerably less. Only the private certificate-signing keys and the CRL-signing keys of the certificate authority need to be kept secret. Each users private key is their own responsibility to be safeguarded. Any compromise of the private keys of the users does not compromise the infrastructure security. This alone reduces the administration maintenance not to mention the cost savings also involved. 31

BIBLIOGRAPHY

[1] C. ADAMSAND S. FARRELL, Internet X.509 public key infrastructure: Certificate managementprotocols., (1999).

[2] C. ADAMSAND S. LLYOD, Understanding PKI: Concepts, standards and deployment considerations, (2003).

[3] D.A. COOPER, A model of certificate revocation, (1998), p. 11.

[4] DIFFIE AND HELLMAN, New directions in cryptography, (1976).

[5] ELGAMAL, A public key cryptosystem and a signature scheme based on discrete logarithms, (1985), p. 4.

[6] S. GARFINKEL, PGP: Pretty good privacy, (1995).

[7] B. R. GOVAERTS AND J. VANDEWALLE, Information authentication: Hash functions and digital signatures., (1993).

[8] A. J.MENEZES, P. C. V. OORSCHOT, AND S. A. VANSTONE, Handbook of applied cryptography, (1996).

[9] A.N. W. D. C. JOSEPHAND D. BRINK, PKI: Implementing and managing e-security, (2001).

[10] S. K. KATSIKAS, The role of public key infrastructure in electronic commerce, (2001).

[11] J. KOHLAND C. NEUMAN, The kerberos network authentication service (v5), (1993).

[12] A. LENSTRA, Ron was wrong, Whit is right, (2011), p. 17.

[13] S. LLOYD, CA-CA interoperability, (2001), p. 19.

[14] E. RESCORLA, Diffie-Hellman key agreement method., (1999).

[15] D. R. STINSON, Cryptography theory and practice, (2006).

[16] T.HOWESAND M. SMITH, LDAP: Programming directory-enabled applications with lightweight directory access protocol, (1997).