TRUST FRAMEWORK FOR THE CLOUD USING A MODEL OF TRUST, DATA MOVEMENT POLICIES, AND USING TECHNOLOGY

By STEPHEN SEAN˜ KIRKMAN

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2019 ⃝ 2019 Stephen Se˜an Kirkman To my mom, Norma, who passed in early 2013 before I started my PhD, but knew I was going for it. ACKNOWLEDGMENTS I first and foremost thank my wife and son, Eva and William, for their patience and support during this awesome opportunity. The path to computer science began with my papa, Robin, who passed in 2010 and who was the first computer in the Kirkman family; he worked with punch cards and COBOL. I remember looking at his greenbar printouts he brought home to debug when I was a boy. I thank my chair and advisor, Dr. Richard Newman for his patience and friendship. I wish to thank Dr. Manuel Bermudez (who is also on my committee) for his friendship, stories on his travels, and thoughts about a career in academia. I thank my PhD Committee members Dr. Daniela Oliveira, and Dr. Swarup Bhunia for their advice and support. I would like to thank a close family friend, Dr. Sandy Miarecki, who has guided and mentored me for the last 20 years. Sandy was one of my wife’s bridesmaids and close military friend. Since I have known her, she has provided an invaluable resource in life and pursuing higher education as I decided to get my PhD. I would like to thank Dr. Sumi Helal for encouraging me to apply to the University of Florida. From the University of Illinois at Springfield, I would like to thank Dr Kamyar Dezhgosha for supporting me to publish my first paper and the Department Chair Ted Mimms for his support. I would lastly like to thank the following individuals at the University of Florida for their assistance in conducting my survey: Florida Institute for Cyber Security (FICS) Research Coor- dinator Lesly Galiana for forwarding my survey to both faculty and students, Dr Curtis Taylor from the UF College of Engineering for including my survey in the bi-monthly newsletter to the UF College of Engineering Undergraduates, and finally Mr. Brian Roberts who coordinated to send my survey to the Warrington College of Business.

4 TABLE OF CONTENTS page ACKNOWLEDGMENTS ...... 4 LIST OF TABLES ...... 11 LIST OF FIGURES ...... 13 ABSTRACT ...... 16

CHAPTER 1 INTRODUCTION ...... 17 1.1 Efforts to Improve Trust ...... 18 1.2 Dissertation Statement ...... 19 1.3 Organization ...... 20 2 LITERATURE REVIEW - ACHIEVING TRUST IN TECHNOLOGY ...... 21 2.1 Trusted Hardware ...... 21 2.1.1 Trusted Platform Module and vTPM (2009) ...... 22 2.1.2 Guard Extensions (2015) ...... 23 2.1.3 Trusted System on Chip Research (2017) ...... 25 2.2 Social Networks and Trusted Third Parties ...... 25 2.2.1 Social Networks to Improve Trust ...... 25 2.2.2 Cloud Security Alliance - STAR Registry ...... 26 2.2.3 Group ...... 26 2.3 Data Focused - Data Provenance and Trust ...... 26 2.4 Encryption ...... 27 2.4.1 Homomorphic Encryption ...... 27 2.4.2 to Enhance Trust ...... 28 2.5 Summary ...... 28 3 BLOCKCHAINS ...... 30 3.1 Distributed Consensus ...... 30 3.1.1 Byzantine Agreement Problem ...... 31 3.1.2 FLP Impossibility Result ...... 32 3.2 Blockchains - ...... 33 3.2.1 Blockchain Fundamentals ...... 33 3.2.2 - Mining ...... 35 3.2.3 Storage - Merkle Trees ...... 36 3.2.4 Forks ...... 38 3.2.5 51% Attack ...... 38 3.3 Blockchain ...... 39 3.3.1 Smart Contracts ...... 40

5 3.3.2 Paying for Space and Computation ...... 40 3.3.2.1 Example 1 ...... 42 3.3.2.2 Example 2 ...... 43 3.3.3 Transactions ...... 43 3.3.4 Data on the Ethereum Blockchain ...... 44 3.3.5 ...... 45 3.4 Multichain ...... 46 3.5 Systems Based on Blockchains ...... 47 3.5.1 Storj, 2014 ...... 47 3.5.2 MedRec, 2016 ...... 48 3.5.3 Blockstack, 2017 ...... 50 3.5.4 Systems Summary ...... 52 3.6 Chapter Summary ...... 52 4 SYSTEMS REVIEW ...... 54 4.1 Trust Systems ...... 54 4.1.1 Excalibur: Policy Sealed Data, 2012 ...... 54 4.1.2 CloudMonatt, 2015 ...... 56 4.1.3 Verifiable Confidential Cloud Computing, 2015 ...... 58 4.1.4 Trustworthy Multi-Cloud Services Communities, 2015 ...... 59 4.1.5 Cloud Trust Protocol, 2015 ...... 62 4.1.6 Cloud Armor, 2016 ...... 62 4.1.7 Tenant Attested Trusted Cloud, 2016 ...... 64 4.1.8 Trust Systems Summary ...... 66 4.2 Data Movement Systems ...... 68 4.2.1 CloudFence, 2013 ...... 68 4.2.2 S2 Logger, 2013 ...... 70 4.2.3 Data Location Control Model, 2014 ...... 73 4.2.4 Stratus Project, 2015 ...... 75 4.2.5 VeriMetrix Framework, 2015 ...... 76 4.2.6 Data Movement Systems Summary ...... 77 4.3 Chapter Summary ...... 78 5 GOAL 1 - CLOUD TRUST MODEL AND VALIDATION ...... 80 5.1 Quantitative Trust Models ...... 80 5.2 Probabilistic Models ...... 83 5.3 A Trust Model for the Cloud ...... 85 5.3.1 Five Degrees of Recommendation ...... 86 5.3.2 Cloud Spiral of Trust ...... 88 5.4 Trust Model Effectiveness and Validation ...... 90 5.4.1 Industry Surveys and Academic Survey Research ...... 90 5.4.2 Power Analysis for Surveys ...... 92 5.4.2.1 Effect size ...... 93 5.4.2.2 Test that a proportion is .50 effect index g ...... 94

6 5.4.2.3 Difference between proportion effect index h ...... 94 5.4.2.4 Our statistical power ...... 95 5.5 Survey Mechanics and Distribution ...... 95 5.5.1 UF Institutional Review Board and Distribution Information ...... 96 5.5.2 Survey Respondents and Demographics ...... 97 5.5.3 Selected Result Charts ...... 98 5.5.4 Survey Summary ...... 101 5.6 Hypotheses for Cloud Trust Model ...... 102 5.6.1 Hypothesis Test Plan ...... 102 5.6.2 Hypothesis 1 ...... 104 5.6.3 Hypothesis 2 ...... 105 5.6.4 Hypothesis 3 ...... 106 5.6.5 Hypothesis 4 ...... 107 5.7 Summary ...... 109 6 GOAL 2 - ORCON CONSUMER POLICY MODEL FOR DATA MOVEMENT .... 110 6.1 ORCON Policy Model Overview ...... 110 6.2 Model ...... 111 6.2.1 Elements of State: Clouds, Consumers, Datasets, Policies, and Tags .. 111 6.2.2 Functions ...... 112 6.2.2.1 Tag function of a dataset ...... 112 6.2.2.2 Location function of a dataset ...... 112 6.2.2.3 Owner function of a dataset ...... 113 6.2.2.4 Mapping of consumer to metadata ...... 113 6.2.2.5 Mapping of consumer i to policy i ...... 113 6.2.2.6 Policy Function ...... 113 6.2.3 State ...... 113 6.2.4 Actions ...... 114 6.2.4.1 Add cloud C ...... 114 6.2.4.2 Add consumer E ...... 114 6.2.4.3 Consumer Ei add data set D with tag TAG at cloud C ... 114 ′ 6.2.4.4 Consumer Ei modify metadata from µ to µ ...... 115 6.2.4.5 Consumer E modify policy to σ′ ...... 115 ′ 6.2.4.6 Move dataset Dij from C to C ...... 115 6.2.5 Valid State ...... 116 6.2.6 Model Summary ...... 116 6.3 Specific Policies ...... 116 6.4 Authorizations, Attestations, and Audit ...... 117 6.5 Summary ...... 118 7 GOAL 3 - CYCLOPS DECENTRALIZED APPLICATION WITH WHITELIST, DATA TRACKING, AND ATTESTATION ...... 119 7.1 Overview - How Does the System Work? ...... 120 7.2 Pilot Decentralized Application ...... 122

7 7.2.1 Decentralized Application GUI Design ...... 123 7.2.2 Design ...... 123 7.2.2.1 Clouds and consumers ...... 124 7.2.2.2 Policies ...... 124 7.2.2.3 Consumer meta-data location ...... 124 7.2.3 Data Structures ...... 126 7.2.3.1 Mappings ...... 126 7.2.3.2 Arrays ...... 126 7.2.3.3 Mapping of Struct ...... 126 7.2.4 Smart Contracts/Policies ...... 127 7.2.4.1 Smart contract technology decisions ...... 127 7.2.4.2 Main contract ...... 128 7.2.4.3 Whitelist policy contract ...... 129 7.3 DApp Upgrade ...... 131 7.3.1 Upgrade 1 - Attestations on Demand ...... 131 7.3.2 Upgrade 2 - Tags to Identify a Virtual Machine ...... 133 7.3.2.1 Virtual machine security ...... 133 7.3.2.2 Open virtualization format ...... 133 7.3.2.3 VMware OVF tool ...... 134 7.3.2.4 Virtual machine tags ...... 135 7.3.3 Upgrade 3 - Whitelist Policy for Data Tags ...... 138 7.3.3.1 Add cloud to tag whitelist ...... 138 7.3.3.2 Delete cloud from tag whitelist ...... 138 7.3.3.3 Check consumer data tag ...... 139 7.3.3.4 List my tags ...... 139 7.3.3.5 List my clouds in my tag ...... 139 7.4 Use Cases ...... 139 7.4.1 Using Metamask to Send Transactions ...... 140 7.4.2 User and Cloud Sign Up ...... 140 7.4.3 Add/Register Cloud with CYCLOPS ...... 141 7.4.4 Procedure for User to Add a Cloud to their Whitelist ...... 141 7.4.5 Delete Cloud from Whitelist ...... 142 7.4.6 List Approved Cloud on Whitelist ...... 142 7.4.7 Delete Consumer from Whitelist ...... 143 7.4.8 List Consumer Policies ...... 143 7.4.9 User Wants to See Attestations ...... 143 7.4.10 Cloud Check Consumer Whitelist ...... 144 7.4.11 Whitelisttag Policy Actions ...... 144 7.4.12 Cloud Provider Performs Check ...... 145 7.5 Implementation and Testing ...... 146 7.5.1 Local Test Setup ...... 147 7.5.2 Technology Decision - Mocha Test Framework Not Used ...... 148 7.5.3 Rinkeby Testing and Results ...... 148 7.5.4 Gas Cost Analysis and Scaling Costs ...... 149 7.5.5 Smart Contract Security and Best Practices ...... 152

8 7.6 Discussion ...... 154 7.6.1 What Are CYCLOPS’ Weaknesses? ...... 154 7.6.1.1 Inaccurate data ...... 154 7.6.1.2 Privacy ...... 155 7.6.2 Why Not Store the Real Cloud Data Off-Chain with Hashes on Chain? 155 7.6.3 Ethereum Is Known to Support only 4-8 Transactions Per Second with a Hard Cap of 15TPS, How will That Scale or be Realistic? ...... 155 7.6.4 What Good Is the System if it Cannot Tell Me Which Cloud is Trust- worthy? ...... 156 7.6.5 How Will We Know if the Cloud Does Not Follow Our Policies? .... 157 7.6.6 Why Would the Consumer Pay for Transactions? ...... 157 7.6.7 How Do You Quantify Your Gains in Trust? ...... 157 7.6.8 How Does the Cloud/Consumers Know What Smart Contract Policies are Available? ...... 158 7.6.9 How Can We Be Sure the Contracts Are Without Bugs? ...... 158 7.6.10 How Can a Consumer Trust the Cloud To Evaluate Her External Poli- cies? ...... 158 7.6.11 Can a Smart Contract Be Updated? What if it Needs to be Improved? 158 7.6.12 If the Ethereum Blockchain Is Immutable, How Can You Delete from the Whitelist? ...... 159 7.6.13 Ether Is Required, How will an Average Consumer Obtain it? ...... 159 7.7 Summary ...... 159 8 CONCLUSION AND FUTURE WORK ...... 160 8.1 Conclusion ...... 160 8.2 Future Work ...... 161 8.3 Features Out of Scope for Dissertation ...... 161 8.3.1 Oracles ...... 161 8.3.2 Confidentiality ...... 162 8.3.3 Exploration of the Recommendation Function F ...... 162 8.3.4 Data Tagging Other than a Virtual Machine ...... 162 8.3.5 Prevent Fake News Insertion into Blockchain ...... 162 8.3.6 Ethereum Main-Net Testing and Deployment ...... 162 8.3.7 Negative Verifications ...... 162 8.3.8 DApp Mechanism that Allows Obtaining ...... 163 8.3.9 Use of Swarm Ethereum Storage ...... 163 8.4 Final Remarks ...... 163

APPENDIX A CLOUD TRUST QUESTIONNAIRE ...... 164 B RAW SURVEY RESULTS - 35 TOTAL RESPONSES ...... 169 C FREE FORM SURVEY RESULTS ...... 175

9 D DAPP CONFIGURATION AND STARTUP INSTRUCTIONS ...... 176 D.1 Installing the Software ...... 176 D.2 Starting up DApp ...... 176 D.3 DApp Processes ...... 177 D.4 Adding a Contract to the Local Test Environment ...... 177 D.5 Adding a Contract to the Ethereum Mainnet ...... 177 D.6 Metamask Usage Guide ...... 177 E VIRTUAL MACHINE TAGGING STEPS ...... 179 E.1 Tagging a Virtual Machine ...... 179 E.2 Convert OVA to OVF for Package Signing ...... 180 F PUBLICATIONS SUPPORTING DISSERTATION ...... 181 F.1 Publications Since Proposal ...... 181 F.2 Publications Before Proposal ...... 181 REFERENCES ...... 182 BIOGRAPHICAL SKETCH ...... 191

10 LIST OF TABLES Table page 3-1 Ether denominations ...... 41 3-2 Blockchains public/private compared ...... 53 5-1 Variables in Castelfranchi degree of trust ...... 83 5-2 ρ Attributes ...... 88 5-3 Definitions for cloud trust functions ...... 88 5-4 Q19. Your age range (no subject under 18 should complete this survey, thank you) 97 5-5 Q20. Technical competence ...... 98 5-6 Q8. I care if my cloud data are accessed by a third-party without my knowledge .. 104 5-7 Q14. This question concerns referred trust. When choosing a cloud, I would use a recommendation from ...... 105 5-8 Q15. I trust more sensitive data to some clouds, but not others ...... 106 5-9 Q16. I have read service level agreements ...... 108 7-1 Main contract methods ...... 128 7-2 Whitelist contract methods ...... 129 7-3 Check for authorization in O(1) time ...... 130 7-4 Data whitelist contract methods ...... 139 7-5 Software and use ...... 147 7-6 CYCLOPS deployment costs ...... 150 7-7 CYCLOPSv1 average cost to deploy policies ...... 150 7-8 CYCLOPSv1 average cost to execute functions ...... 150 7-9 Deploy gas estimate comparison ...... 150 7-10 WhiteList functions gas analysis. Gas analysis is for one cloud (deleteconsumer for the Rinkeby network had two clouds). Calls are free, but still subject to gas limits (i.e., it is possible to run out of gas locally for high EVM use) ...... 151 7-11 WhiteListTag functions gas analysis. Calls are free, but still subject to gas limits (i.e., it is possible to run out of gas locally for high EVM use) ...... 151

11 7-12 Main functions gas analysis. Calls are free, but still subject to gas limits (i.e., it is possible to run out of gas locally for high EVM use) ...... 151 B-1 Q1. I trust the cloud ...... 169 B-2 Q2. I trust one cloud company more than others ...... 169 B-3 Q3. Rank these cloud concerns (1 being most important) ...... 170 B-4 Q4. Do you order from Amazon? ...... 170 B-5 Q5. If you don’t order from Amazon, why not? ...... 170 B-6 Q6. Do you use Facebook? ...... 170 B-7 Q7. If you use Facebook, how often do you use it? ...... 170 B-8 Q8. I care if my cloud data are accessed by a third-party without my knowledge .. 171 B-9 Q9. Please rank the types of data below from most sensitive to least sensitive. How likely are you to store the data in the cloud? ...... 171 B-10 Q10. I use virtual machines? ...... 171 B-11 Q11. At what detail would you like to know where your data resides? ...... 172 B-12 Q12. I care if my cloud data is moved without my knowledge (choose one) ... 172 B-13 Q13. Rank these potential sources of knowledge (for a recommendation) in the or- der that you would use it ...... 173 B-14 Q14. This question concerns referred trust. When choosing a cloud, I would use a recommendation from ...... 173 B-15 Q15. I trust more sensitive data to some clouds, but not others ...... 173 B-16 Q16. I have read service level agreements ...... 173 B-17 Q17. I would pay as a one-time charge to use a personalized cloud policy, if all clouds followed it ...... 174 B-18 Q19. Your age range (no subject under 18 should complete this survey, thank you) 174 B-19 Q20. Technical competence ...... 174 C-1 Q18. Assuming the cloud followed your policies, what kind of cloud policies would you like to see available for use? ...... 175

12 LIST OF FIGURES Figure page 2-1 TPM architecture ...... 22 2-2 SGX application split ...... 24 3-1 Bitcoin chain of blocks ...... 34 3-2 Hashing of hello world with nonce ...... 35 3-3 Blocks with merkle tree ...... 37 3-4 Code and storage references ...... 44 3-5 Storj sharding ...... 48 3-6 Medrec architecture ...... 49 3-7 Medrec process flow ...... 49 3-8 Blockstack architecture ...... 51 4-1 Excalibur architecture ...... 55 4-2 CloudMonatt architecture ...... 57 4-3 Debt framework ...... 60 4-4 Social network graph: vertices are services, edges are interactions ...... 61 4-5 CTP example use case ...... 62 4-6 Cloud Armor architecture ...... 63 4-7 Tenant-attested framework diagrams ...... 65 4-8 CloudFence interactions ...... 69 4-9 CloudFence architecture ...... 69 4-10 S2Logger architecture ...... 71 4-11 DAG event graph ...... 72 4-12 ECSM architecture ...... 73 4-13 ECSM architecture with two clouds ...... 74 4-14 VeriMetrix architecture ...... 77 5-1 Abilities from different opinions versus beliefs strength [41] ...... 81

13 5-2 Probabilities for two different abilities of Y [41] ...... 82 5-3 Taking recommendations ...... 87 5-4 Cloud spiral of trust ...... 88 5-5 Trust cycle ...... 89 5-6 Q1. I trust the cloud ...... 98 5-7 Q3. Rank these cloud concerns (1 being most important) ...... 99 5-8 Q8. I care if my cloud data are accessed by a third-party without my knowledge .. 99 5-10 Q12. I care if my cloud data is moved without my knowledge ...... 100 5-9 Q11. At what detail would you like to know where your data resides ...... 100 5-11 Q13. Rank these potential sources of knowledge (for a recommendation) in the or- der that you would use it ...... 101 5-12 Q14. When choosing a cloud, I would use a recommendation from ...... 101 5-13 Q17. I would pay as a one-time charge to use a personalized cloud policy, if all clouds followed it ...... 102 7-1 CYCLOPS DApp menu ...... 119 7-2 Blockchain, user, and cloud interaction ...... 121 7-3 Smart contract policy framework ...... 124 7-4 Code and storage references ...... 125 7-5 Struct pattern for lists in (strings and addresses) ...... 127 7-6 Whitelist design structure ...... 129 7-7 Attestations from Rinkeby testing ...... 132 7-8 Consumer tag structure ...... 138 7-9 Policy sign-up ...... 141 7-10 Add cloud to system ...... 141 7-11 Cloud is added and accessible to consumers ...... 145 7-12 Cloud making a consumer policy query ...... 146 7-13 Policy sign-up ...... 146

14 7-14 Cost to add and delete cloud providers from a whitelist (i.e., add Amazon, then , etc...) ...... 153 B-1 Q3. Rank these cloud concerns (1 being most important) ...... 169 B-2 Q9. Please rank the types of data below from most sensitive to least sensitive. How likely are you to store the data in the cloud? ...... 171 B-3 Q13. Rank these potential sources of knowledge (for a recommendation) in the or- der that you would use it ...... 172 E-1 Using Oracle Virtualbox to export VM to OVA ...... 179

15 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TRUST FRAMEWORK FOR THE CLOUD USING A MODEL OF TRUST, DATA MOVEMENT POLICIES, AND DECENTRALIZATION USING BLOCKCHAIN TECHNOLOGY By Stephen Se˜an Kirkman August 2019 Chair: Richard E. Newman Major: Computer Engineering Current research into cloud trust focuses on ensuring a trustworthy platform and attes- tation of the security of that platform via a trusted third party or trusted server. This has typically been within the confines of a single cloud. We look at the trust problem from the perspective of data movement between different clouds. When consumers put data in a cloud, the cloud has been moving data to other clouds and allowing data access to third-parties or other cloud providers. Service level agreements describe the level of service the cloud provides to the consumer (e.g., speed, availability, cost), but rarely do we see negotiation. We use smart contracts so that any cloud can evaluate consumer policies. We use the Ethereum blockchain technology and testnet to test the smart contract policies. We model trust in cloud computing to illustrate how consumers make decisions to trust the cloud. We validate our trust model using an a-priori statistical power analysis and survey of cloud attitudes and usage. The survey also enabled us to seek guidance on the most useful data movement policies. Finally, we created a prototype decentralized application to demonstrate the viability of smart contract policies and low cost to the cloud and the consumer.

16 CHAPTER 1 INTRODUCTION “Amazon is responsible for the security of the cloud...customers are responsible for security in the cloud.” — Summary of Amazon Shared Responsibility Model [1] When it was in its infancy, the cloud was believed to provide better security of data than even the consumer could provide because the cloud had access to more resources than a typical consumer; the cloud has its employees and its cutting edge technology. This notion is now outmoded as evidenced by Amazon’s Shared Responsibility Model. We can no longer trust cloud companies to provide all the security; better security is no longer a driving force to use the cloud. The motivation for this research is to improve and study consumer trust in the cloud using a mechanism and model for the cloud and consumers to bridge this security gap created by the increased complexity of the cloud and a new cloud security paradigm. Cloud industry trust continues to be a problem [47, 48, 50, 53, 60, 64, 85, 99, 108, 116]. Issues with service level agreements (SLAs), ownership, policies, and data location are all still concerns [19, 21, 57]. The SLA model focuses only on what services the cloud provides and it is a one to many device (one cloud to many consumers). We want a device that works one consumer to many clouds. There are no mechanisms that we know of that consider this weakness of SLAs and allow the consumer to specify how and when a cloud may pass consumer data to third parties; a mechanism that allows the consumer to articulate their data handling policies to any cloud. How can consumers’ desires regarding third party access and data movement be articulated to any cloud they use without repeating the same policy over and over in a language specific to each cloud? Furthermore, there is not enough transparency for data movement/data location in the cloud. All the current policy-based approaches to trust that we found in our literature review try to improve transparency by storing the policies and attestations on systems within and managed by the cloud [54, 101, 125] and use a trusted third party (TTP) as the final vouching authority. However, a TTP also has the burden to prove its own trustworthiness. Therefore,

17 there needs to be a new trust mechanism that does not rely solely on TTPs. Moreover, attestations provide vital evidence (proof) of security or an action. If the attestation data are stored in the cloud, the administrators could manipulate those data. What is needed is high integrity storage that cannot be manipulated, to wit we propose blockchains as part of a solution. There are valid use cases to duplicate data for availability, but this has typically been opaque to the user. There is a trend towards inter-cloud virtual machine (VM) migration [109, 114]. This elicits concerns over ownership [57] and data location [57, 62]. Furthermore, cloud companies have data centers in different countries and are now part of cloud federations, each with potentially different policies. The cloud (i.e., cloud providers) now have associations with many third parties; there is currently no way of knowing who these third parties are to explicitly authorize them to have access to our data. In 2017, a partner of Verizon allowed a mis-configured server to go undetected which led to personal data being publicly accessible. Equifax had two incidents in 2017. After the first major incident, a second incident occurred involving a third-party link that appeared on the Equifax website directing consumers to a fraudulent website. In 2018, a third- party app developer of Facebook collected personal data of Facebook users and subsequently passed that data to a data analytics firm against Facebook policy [22]. Public perception remains that the cloud is not trustworthy [53, 64, 78]. 1.1 Efforts to Improve Trust

Existing cloud trust models are suitable within a single cloud, but are awkward at best for multiple cloud environments. IEEE created a project called InterCloud to improve inter-cloud interoperability; we used this name briefly for our research prototype. Their goal is to create a vendor neutral testing environment from which to develop cloud standards. There are no more updates on this effort since 2015.1

1 http://www.intercloudtestbed.org/

18 Most existing policies address access to data, rather than its movement, storage, and processing. Efforts across many disciplines have contributed research to make the cloud more trustworthy [3, 31, 54, 68, 69, 79, 87, 89, 93, 96, 101, 104, 105, 110, 115], but there is little work being done in data movement and third parties. General approaches include reputation systems, trust management systems, hardware- based integrity approaches, location-based approaches, policy-based approaches, social trust, or a combination of the above. Many solutions have consumer data policies located within the cloud infrastructure. This is a problem because it creates an organizational conflict of interest (i.e., the organization is policing itself). 1.2 Dissertation Statement

The central theme for this dissertation is to give the consumer the tool and opportunity to express their desires to the cloud using a consumer based policy approach that leverages blockchain technology to help reduce dependence on TTPs. We want to shift the cloud paradigm from an implicit, unintentional transitivity of trust (i.e., trust the cloud with respect to data location) to an explicit, transparent trust (i.e., users specify to the cloud whom they trust). The transitive nature of trust we now face has caused a trust problem across the cloud industry and beyond. This shift can be described by the table below from the perspective of policies, trust, and the effect. Paradigm Old −→ New Policy Provider Centric User Centric Trust Implicit Explicit Effect Opaque Transparent

We have three primary goals: 1. Develop a trust model to support and explain consumer trust in the cloud focusing on data value. Conduct a survey and statistical power analysis to validate our trust model;

2. Develop a policy model which focuses on consumer based policy design;

19 3. Develop a prototype decentralized application (DApp) using smart contracts as policies and blockchain technology, thereby reducing dependence on TTPs, including on demand based attestations for consumers. 1.3 Organization

The remainder of this dissertation is organized as follows: Chapter 2 reviews current trust literature. Chapter 3 provides a background in blockchain technology and Ethereum. Chapter 4 reviews systems that have been built to support trusted computing. Chapter 5 contains results of our first goal: our trust model and validation work. Chapter 6 describes our second goal: our policy model that guides us in the development of smart contract policies. Chapter 7 expresses the results of our final goal, our first DApp prototype with test results. Chapter 8 contains our conclusion and possibilities for future work. Appendices contain more detailed information on our survey, DApp startup procedures, and important DApp processes.

20 CHAPTER 2 LITERATURE REVIEW - ACHIEVING TRUST IN TECHNOLOGY Confidentiality, integrity, and availability (CIA) are the basic goals of . We want our data to remain confidential. We do not want unauthorized parties to be able to view or manipulate our data without our consent. We want our data to have integrity; we want it to reflect what we expect it to reflect. We want our data to be available. These security goals are fulfilled through network security, system security, and personnel security. Networks are guarded at the perimeter by firewalls and intrusion detection systems. Employees use accounts and badges. Files and account access are audited via system controls. With the advent of the Internet, then the World Wide Web, encryption became commonplace to protect data in transit and at rest. Transparency is having knowledge of what is being done to consumers’ data, how it is being done, and why it is being done. The amount of transparency that is demanded depends on the value individuals place on their data. If the data are not valuable, a consumer might not care where it gets stored or who sees it (e.g., friends on social media). If the data are very valuable, consumers will want more transparency and potentially stronger guarantees. In addition, trust in technology is now becoming inter-disciplinary because technology touches every aspect of our lives. We review the following techniques that strive to obtain transparency and ultimately trust: encryption, trusted hardware, trusted third parties, social trust, blockchains, and data provenance. 2.1 Trusted Hardware

The trusted platform module (TPM), developed by the (TCG) [17], was one of the first hardware-based solutions that provided hardware rooted attestations on all the software at boot time. Following the lead of the TCG, several other projects have ventured into this space: ARM Trustzone [2](security extensions for the ARMv6 processors), Intel’s Trusted Execution Technology (TXT) [11](provides secure enclaves), Aegis secure processor [111](protects against physical attacks as well as software attacks),

21 Figure 2-1. TPM architecture

Sanctum[46](includes cache timing and memory access pattern attacks), and Phantom [81](memory access traces are computationally indistinguishable). We will cover the two most popular: the Cloud Security Alliance’s TPM and Intel’s SGX in more depth. 2.1.1 Trusted Platform Module and vTPM (2009)

The Trusted Platform Module (TPM) is supported by the Trusted Computing Group (discussed later) and was standardized in 2009. It provides for a security identifier, rooted in hardware, in addition to secure storage for attestations. The TPM also provides for [33]:

• Secure identification of devices • Secure generation of keys • Secure storage of keys • Non-Volatile Random Access Memory (NVRAM) storage • Device health attestation Device health attestation gets most of the attention from a trust perspective. Within the TPM are platform configuration registers (PCRs) see Figure 2-1 [49]. The PCRs store hashes of measurements taken during the boot process. A hash is a one-way cryptographic operation that takes any string of characters as input. The output is a

22 fixed length value (typically a 160-bits) that is always unique for that input. Once hashed, they cannot be reversed. The measurements use a hash extend procedure to take measurements of the software stack [33]: 1. Start with an existing value, A (the value to be extended)

2. Concatenate another value B to A (extending it) creating the message, A ∥ B

3. Hash the resulting message to create a new value,

4. This new hash value A′ replaces the original value, A The process can be summarized mathematically as: A′ = hash(A ∥ B) (2–1) There are proposed architectures [28, 95, 100, 101, 103] that leverage the TPM to support cloud trust. However, since the TPM works on a per system basis, these architectures are also per system. Some solutions use a centralized authorization server within a single cloud or data center. While the TPM is an integral part of an overall security solution, the PCRs are only designed to report state, not passes judgment on the state [33]. Attestations via the TPM should complement other security solutions. The Virtual Trusted Platform Module (vTPM) is a software simulated version of the TPM. Because there are typically multiple virtual machines VMs per host, there is a require- ment to manage the vTPMs. Therefore, a vTPM manager is charged with creating vTPMs and responding to migration requests 2.1.2 Intel Software Guard Extensions (2015)

Intel has added extensions to the Intel Instruction Set Architecture (ISA). They use the CPU to protect and isolate areas of the user code process memory space; these are called enclaves. The aim of Intel Software Guard Extensions (SGX) is both to reduce the footprint of the trusted computing base (TCB) and to provide confidentiality and integrity guarantees. Applications must be split into untrusted and trusted components. SGX can then protect the secrets inside the protected enclave - even if an attacker has full control of the platform. Figure 2-2[44] shows how an application operates using the SGX. The steps are as follows:

23 Figure 2-2. SGX application split

1. App is built with trusted and untrusted parts

2. App runs and creates the enclave, which is placed in trusted memory

3. Trusted function is called, execution is transitioned to the enclave

4. Enclave sees all the process data in clear; external access to enclave data is denied

5. Trusted function returns from the call; enclave data remains in trusted memory

6. Application continues normal execution The enclave is in memory and is encrypted and partitioned. Special CPU instructions (i.e. extensions to the Intel architecture) allow for creation, protection, and access to the enclave. These instructions carve out partitioned memory space. There is a controlled access point to the enclave memory, but code within the enclave can access anything outside. Only privileged commands may access the data within. This enclave can be used to perform secure computations [71].

24 2.1.3 Trusted System on Chip Research (2017)

The chain of trust problem we identify in the cloud is analogous to that being researched in the system on chip (SoC) research space. Due to time-to-market demands and chip complexity, microprocessors are now designed as systems with components originating from different vendors. The vendors and integration houses do not always have the time to fully evaluate and identify vulnerabilities caused by malicious insiders at outside vendors. These components are termed Intellectual Property (IP). Modern system on chip design involves hundreds of pre-designed and sometimes pre-verified hardware IPs. There are multiple players in this ecosystem. How do you ensure the integrity of the whole chip if the chip is assembled from components spanning multiple different vendors? Some of these might be untrusted. Most research is on detection and mitigation on each component prior to integration on the chip - thus attempting to exploit the vulnerabilities of each component without considering impacts integration may have. Basak et al. [35] take a different approach. Rather than static verification of each component IP, they propose an architecture level solution that uses a centralized controller for security policies and wrappers for IPs. They want to detect potentially untrustworthy behavior that might originate inside the IP and impact the whole SoC. 2.2 Social Networks and Trusted Third Parties

Adding the human dimension from social networks might enhance a host’s ability to analyze trust. In addition, trusted third parties come in many forms. We briefly review research into social network trust and popular trusted third-parties in cloud computing. 2.2.1 Social Networks to Improve Trust

Social networks are another method to improve trust by modeling trust based on social networks. One approach [88] is to allow social network users to assign trust values to data, files, and URLs. This data could be stored and later be consumed and distinguished by a socially aware architecture. An extension to existing platforms would be required to integrate social networks into a socially aware operation system for consumption. In addition, [88]

25 models and infers trust by employing a transitive trust process across multiple entities. While we do not entirely subscribe to the transitive trust model, research in the socially aware computing paradigm can make security policy easier to understand and implement. A model such as this brings users more into the process. We follow this same approach by allowing consumers to specify their own cloud policies. 2.2.2 Cloud Security Alliance - STAR Registry

The Security, Trust and Assurance Registry (STAR) is sponsored through the Cloud Security Alliance [4]. The STAR Registry is an effort to rate the trustworthiness of cloud providers. The STAR has multiple levels of assurance. The STAR provides a rating stamp of approval. There are many levels assessment and a certification range. At the lowest end, the registry accepts and posts self-assessment questionnaires from cloud providers. The higher ends require rigorous independent third-party assessments based on the ISO/IEC 27001, a security standard published jointly through the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC). 2.2.3 Trusted Computing Group

The Trusted Computing Group (TCG) is made up of member companies from across the world and in wide ranging disciplines like networking, hardware, security, software development, and infrastructure companies [17]. They also support the development of the TPM and trusted computing standards. The TCG has defined trusted computing to mean that computing integrity is guaranteed and anchored in hardware. 2.3 Data Focused - Data Provenance and Trust

Data provenance is another space of research that attempts to capture the history of data, thus achieving transparency. Because data can be easily copied, tracking its history helps with trust issues. Research in data provenance focuses on actions performed on data from its creation and onward. Data provenance can support cloud trust, however the metadata that is generated from data movement and data copy must be stored safely, efficiently, and associated with the original data. Storing and manipulation of provenance data has gained research

26 attention because provenance data can help provide evidence to support data confidentiality and adherence to SLA requirements [36, 110]. 2.4 Encryption

To use the cloud securely, research into different forms of encryption has opened new possibilities. Encryption provides the fundamental capacity to keep data confidential both at rest (i.e., in storage) and while it is being transported (i.e., moved to or from the cloud). Where encryption falls short in cloud computing is where some computation or manipulation of data is required. While providing a thorough review of all types of encryption is outside the scope of this dissertation, we discuss homomorphic encryption because homomorphic encryption provides confidentiality during data manipulation. 2.4.1 Homomorphic Encryption

Homomorphic encryption allows the entity with the ciphertext to run operations on it that will mirror the same operations in plain text without revealing the original data. This means a cloud provider could manipulate the data without even seeing it! Simple homomorphic encryption involves just one operation. Simple homomorphic encryption is provided by plain

RSA. As an example, suppose you have two ciphertexts c1, c2 and two plaintexts with numbers

m1 m2 , where k k c1 = m1 %n and c2 = m2 %n for some n (2–2) The simple homomorphic property on one operation (multiplication, in our example) says that multiplying two numbers in the clear is equivalent to multiplying their encrypted counterparts using RSA. The ciphertexts are built using RSA as the encryption algorithm Enc. To verify this, however, one must decrypt the result using the secret key. This property can be verified and is illustrated in Equation 2–3. k k k c1c2 = (m1 m2 )%n = (m1m2) %n = Enc(m1m2) (2–3) Plain RSA is a seminal public key encryption algorithm that has this simple homomor- phic property; but it is not used in practice because it is both insecure and only works in a minimally restricted case. Nonetheless, research has continued in homomorphic encryption.

27 As such, there are levels homomorphism that are being explored in research to support cloud computing. Somewhat homomorphic encryption provides for a partial number of homomorphic operations. Fully homomorphic encryption (FHE) allows one to compute arbitrary functions over encrypted data without the decryption key. Craig Gentry’s seminal work in this field provided the first roadmap to a fully homomorphic implementation. Advances have been made, but its processing speed is still not practical for general computing applications [58]. There is continual research into optimizing FHE schemes. Recent research out of Port- land, Or. is advancing the state of the art. The Rapid MAchine-learning Processing Appli- cations and Reconfigurable Targeting of Security (RAMPARTS) initiative is making FHE faster, but programming for it has advanced to the approximate complexity of programming in Assembly language 30 years ago [94]. 2.4.2 Blockchains to Enhance Trust

Blockchains are also built using techniques in encryption. Blockchains could be considered immutable databases except the records (blocks) are connected using hashing techniques. Blockchains require no central trust mechanism, hence, have no central point of failure. Even though they are designed with digital currencies in mind, in a distributed cloud environment, we see a use case for adjudication of inter-cloud operations by using the blockchain as the mechanism to provide the authoritative source for cloud policies and attestations. There are many blockchains including: [12], Bitshares [80], and smaller ones. We will cover: Bitcoin [84], Ethereum [7], and Multichain [61], in more depth in Chapter 3. 2.5 Summary

Due to the comparatively slow advancement of homomorphic encryption, research has continued in other areas of trusted computing including trusted hardware, decentralization techniques, and data provenance. The Trusted Platform Module provides hardware-based trust at the system level. An application on the system must use PCR values in an authorization policy, or a remote party

28 asks for a signed attestation (quote) of the values and judges their trustworthiness [33]. This means there must be external records of the expected software measurements to determine if a deviation from the baseline exists. Depending on the amount of software on the system, this can become quite complex. Intel’s SGX chip made advances in secure enclaves. Its primary weakness is that using the Intel SGX functionality requires: 1) Intel SGX enabled CPU, 2) Intel supported BIOS functionality, and 3) a license to use the Intel SGX platform software. Additionally, debugging tools for SGX require professional development editions (which come at a cost) [10, 45]. We discussed the Cloud Security Alliance Star Registry. It should be noted that the highest cloud rating, which nobody has, is currently under development and represents continuous monitoring. Most companies including Amazon AWS have the self-assessment rating, while Azure so far has the highest rating [4]. Storing and manipulation of provenance data has gained research attention because prove- nance data can help provide evidence to support data confidentiality and SLA requirements [36, 110]. Blockchains provide a decentralized mechanism for high integrity storage. All industries are starting to recognize the many benefits of blockchain based operations. There are new digital currencies with their own blockchains being created almost every day. Bitcoin and Ethereum are the most popular. Our goal is not to minimize any of the research, but to augment it. We believe that entities such as the CSA could benefit from a blockchain-based solution to help attestations in the cloud environment and to further assist consumers. We want to ensure that if data cross any management boundaries (e.g., data center, organizational, or geo-political), then they are tracked and there is no way to manipulate any logs once stored. Blockchains can be an alternative to using a trusted third party for this.

29 CHAPTER 3 BLOCKCHAINS A blockchain is a public that is computationally infeasible to alter once set. It is essentially a database where each record is termed a ‘block’ and each block is connected to new blocks using the cryptographic hash primitive. Blockchains use a protocol to reach consensus on the accuracy and order of the transactions stored on it. We will cover blockchain fundamentals, and some systems built on blockchains. We will also review the following blockchains. • Bitcoin: The first public blockchain based on the breakthrough paper by Satoshi Nakamoto in 2008 [84]. • Ethereum: A next generation public blockchain technology whose aim is to build a decentralized ‘world computer’ that runs programs called smart contracts [7]. • Multichain: A private blockchain whose goal is to improve upon the limitations of Bitcoin in mainstream finance [61]. Before discussing how blockchains are constructed and operate, we will provide a brief overview of the problem of distributed consensus to frame our discussion of the blockchain solution to this problem. 3.1 Distributed Consensus

A fundamental problem in distributed computing is consensus: how to get a set of processes to agree on a value. Distributed consensus has been studied for years. Solving distributed consensus depends on the assumptions that are applied for a particular situation. The most prominent assumptions that are modeled cover timing and failure. The timing models are synchronous versus asynchronous; the failure models include fail/stop versus Byzantine. In a synchronous model, every process is in lockstep and messages are certain to be delivered within a specific time frame. In an asynchronous model, there are no guarantees about how long a process will take to respond if it responds at all or how long it takes a message to be delivered. In a partially synchronous model, it is possible to tell if a process has died via timeouts and thus a partially synchronous model is the middle ground between the two extremes.

30 The other model to consider is the failure model. In fail/stop, a failed process simply stops and does not send any more messages or execute any more steps. This is the most benign model. The other extreme is Byzantine failure in which any behavior is possible for a failed process. Byzantine failure is so-named based on the historical accounts of the notoriously devious generals of the Byzantine army and the related Byzantine agreement problem. Byzantine failures are worst-case failures where even failed nodes may collude [42]. Researchers focus on Byzantine failures to model worst case real-world behavior. 3.1.1 Byzantine Agreement Problem

In the Byzantine agreement problem, all generals communicate by messenger to get their orders to attack or retreat. Due the presence of traitorous generals, orders to attack might be ignored or messengers might be ordered to send the wrong message. If all the orders are the same, all honest generals must agree on this value and they must agree to attack or retreat simultaneously based on a message they received. They will only succeed if all the honest generals do the same thing. The honest and loyal generals must still come to consensus on the whether to attack or retreat. To agree, generals run a distributed consensus protocol. In some cases, consensus is solvable [42]. The details of the solution to the problem is not germane to this discussion, however the implications are. Foremost, the solution requires a synchronous environment. In an asynchronous model, even with the strongest assumptions about failure (i.e., fail/stop), you cannot solve the problem because you cannot tell the difference between a slow process and a failed process. For the weakest assumption, in the Byzantine model, processes can do just about anything to impact the transmission of messages and collusion, yet in a synchronous environment the problem can be solved. The Byzantine General (BG) protocol described in [42] will find consensus only if M > 3k where M = number of processors and k = maximum number of traitors. The BG protocol re- quires O(Mk ) messages. Big O notation provides a mathematical summary of the cost limiting behavior of a function and reflects which variables are the most significant limiting factors over the long term. The BG protocol has exponential growth of messages in k magnitude. Since

31 Byzantine failures can behave like fail/stop failures, many messages are required to overcome this kind of failure. If the environment is one of partial synchrony, we can detect failures via heartbeats and other mechanisms. To summarize, consensus (in a synchronous environment) is hard especially when you have Byzantine type failures, but consensus is possible under certain assumptions. However, in an asynchronous environment, it is not only hard, but impossible. This was proven by Fischer, Lynch, and Paterson (FLP) [55]. 3.1.2 FLP Impossibility Result

In 1982, the FLP result [55] proved that it is impossible in an asynchronous setting to determine consensus with a failed process even with the strongest assumptions about failure. The research was a breakthrough in distributed computing. The proof of the model is extensive and beyond the scope of this dissertation. It is considered a negative result, meaning if consensus in a general asynchronous environment is not possible, then it is certainly not possible in a “real-world” asynchronous environment. The FLP result is based on a general model, but with some very specific assumptions and rules of operation, the most prominent being asynchronous timing. The existence of the FLP result can make it easy to dismiss research in the distributed asynchronous environment. In reality, research continues, so there is a disconnect between the impossibility result and reality. This could be explained by not adequately understanding the result. In addition, there are several assumptions that, if any one was relaxed, could lead to different results. Chow [42] suggested that one way might be to change the definition of distributed agreement. This is exactly what blockchain technology does to solve the distributed consensus problem. Blockchains overcome the challenges in the asynchronous setting by changing, or more accurately, by relaxing the simultaneous agreement time requirement and the permanence of the “final state”. Blockchains only require transactions to be distributed to nearby peers; it will most likely be the case that not all nodes in the network get them immediately. Partial agreement occurs whenever someone “mines” a block. Full agreement is achieved over time as

32 the newly mined block is buried deeply in the chain; the deeper a block is buried, the less likely it is to change. In the blockchain model, agreement becomes probabilistic over time; the more time that has progressed, the greater the probability the “final states” (transactions) in the longest blockchain are accurate. 3.2 Blockchains - Bitcoin

We rely on centralized banks to provide the function of a trusted third party to hold and manage our accounts. Bitcoin gained widespread prominence with the breakthrough paper by Satoshi Nakamoto [84] which solved and implemented a relaxed version of distributed consensus using a combination of digital signatures, decentralized peer-to-peer network, and proof-of-work (PoW). These combined with a public ledger eliminate the need for a trusted third party. However, it is not a perfect solution. As a currency, the value cost of bitcoin is extremely volatile and has become speculative. Moreover, PoW computations have, at times, expended power at the rate of a small nation. In addition, the size of the blockchain itself grows forever and is a long-term problem. However, it is an elegant solution to decentralized trust. As an aside, Satoshi Nakamoto has never been identified and could even be multiple individuals behind a single name. Several individuals are suspected as being behind the paper. 3.2.1 Blockchain Fundamentals

A blockchain is a ledger of transactions stored on a peer-to-peer network. The blockchain is created via a protocol that all participating peers constantly run. The whole point of the blockchain is storage and consistency of transactions. As transactions are created by consumers, they are distributed throughout the network for nodes to collect them into bundles called “blocks”. When a group of transactions has reached appropriate size (as determined by the protocol), the transactions become a block and are validated using rules of the protocol. The rules are different for each blockchain. For Bitcoin, the rules involve verifying the unspent transaction output for a user (UTXO). Once a UTXO is spent, it cannot be spent again unless the block containing the UTXO is re-mined (i.e., spent again) after it has been validated in a

33 block. Checking to ensure UTXOs have not been previously spent in an “older” validated block is part of the Bitcoin’s validation and mining process. Once the block gets “mined”, the successful peer to mine the block earns digital currency and her block is added to the chain. The block is then distributed throughout the network and linked to the blockchain using cryptographic techniques. It should be noted that mining it completely voluntary. The blockchain grows as new blocks are added to the blockchain. Once added, the block can no longer be deleted without re-mining the block and all the blocks that have since been added. For those peers that are mining, there are many new blocks simultaneously being distributed throughout the network. As a result, it is possible (and is a normal condition of the blockchain) to have several “good” blocks at the same link in the chain. Which block does the miner choose to continue the build? The answer is: all of them. Since two miners can mine the same next block simultaneously, it is possible to have forks in the chain. The rule in the protocol is that the longest chain “eventually” wins. Forks are discussed later in more depth. There is only going forward in a blockchain, it is not feasible to go backwards. Once a block gets added to the main chain and accepted by the whole network, if a malicious peer wanted to alter or delete a block; the hash of the block would necessarily change. The change in hash value would necessarily cause a domino effect forward. The chain structure is illustrated in Figure 3-1 [84].

Figure 3-1. Bitcoin chain of blocks

34 Figure 3-2. Hashing of hello world with nonce

3.2.2 Proof of Work - Mining

In a blockchain, each connected block contains information about transactions, a reference to the preceding block, and an answer to a complex mathematical puzzle. This puzzle is referred to as proof of work (PoW). In Bitcoin’s case, PoW involves finding a nonce that gives a pre-image for a hash that has a certain number of leading zero bits. A nonce is a one-time arbitrary number padded to an existing number. Figure 3-2 [23] illustrates this as repeated hashing of text with changing nonce (the nonce is after Hello World!). PoW is the driving force behind the creation of the blockchain because it incentivizes people (i.e., miners) to solve complex problems in exchange for digital money. PoW creates value through the amount of effort that is expended by mining a new block that incorporates the header of the previous block mined. The complex problem in Bitcoin is to repeatedly calculate hashes of numbers or strings to find a result that starts with a certain number of zeros. The Bitcoin protocol automatically adjusts the complexity so that a new block is mined approximately every 10 minutes. Because peers (a.k.a. miners) enter and leave the peer-to- peer network all the time, sometimes there are more miners and solutions are found faster. To achieve the 10 minute average, Bitcoin changes the complexity of the problem making it harder or easier to achieve around 10 minutes per block. This is done, for example, by increasing x where x is the required number of preceding zeros of the hash value. The first miner to solve the problem is rewarded with digital currency. The miner then forwards the new (solved) block throughout the network so the other miners will incorporate it onto the chain and build on top of it [122].

35 Some people choose to mine as a business. The “business of mining” is out of scope of this dissertation, but it is worth mentioning that as a business, (for mining to be profitable), miners have to consider all the costs. Mining costs equipment, cooling, energy, and any other typical costs for ordinary business. But mining (general computing as a matter of fact) expels carbon dioxide; mining is not at all green and costs “real” money. The opposing argument could be made that running banks costs the same energy to perform nearly the same function. Recognizing that it is a waste of energy, PoW now has many detractors. Mining for Bitcoin has always required significant computational effort; initially through the use of high power graphics processing units (GPUs). However, running these continually generates heat, consumes energy, thus costs money. Bitcoin mining has now advanced to require application specific integrated circuits (ASICS). ASICS are even more expensive. Many of these mining pools consume as much power as several small countries combined. In summary PoW is elegant, however it wastes energy, PoW is vulnerable to ASICS through centralized computation pools, and PoW lacks ‘finality’. There are different kinds of finality, but in general terms, finality guarantees that history will never be changed. PoW has probabilistic finality, but it is now vulnerable to ASICS. There is also ‘absolute’ finality, where a block is considered immediately finalized. Bitcoin’s has approximately 1 hour finality, Ethereum has 6 minute finality, and Tendermint has 1 second finality.1 3.2.3 Storage - Merkle Trees

A Merkle tree, as used in distributed ledgers, is a binary tree in which the leaves represent transactions and each intermediate node represents the hash of the two nodes immediately below it. The root hash represents the effective hashing of the whole tree. Any change to any leaf will flow up and change the root hash. This provides detection of changes for integrity guarantees. As the blockchain grows, so does the space that is consumed by it. Much of this cannot be avoided. A Merkle tree is used to preserve the integrity of the state using hashes of

1 https://medium.com/tendermint/a-to-z-of-blockchain-consensus-81e2406af5a3

36 Figure 3-3. Blocks with merkle tree transaction values. If space is required, removing or archiving old transactions from the blockchain will not break any of the hashes that make up the blockchain because some nodes can store just the headers. To reduce storage at a peer, old transactions within a block (a pre-determined block depth) might be pruned from the whole block. The hashes above the pruned nodes remain consistent so that a particular transaction may be verified for integrity using only it and the hash values along the path from it to the root. The final root hash in the block header can be used for this “quick reference” verification as shown in Figure 3-3 [84]. By quick reference, we mean that if someone wants to know about the validity of a certain transaction within that block, it is only necessary to provide the target transaction plus the other hash values on the way up to the root to another peer to verify that a certain transaction is contained within this block. Anybody that has the block header can take a specific transaction and chain of hash values and, on their own, they can hash those and come to the same conclusion about the existence and validity of that transaction. Lastly, storing only the Merkle root hashes and headers rather than the whole block of transactions makes start up faster for new peers downloading a fresh copy of the blockchain.

37 3.2.4 Forks

A occurs when there is a difference in state; meaning that there is a block that has two or more blocks that immediately follow it on the chain. This happens frequently in the normal processing of the bitcoin protocol due to race conditions. It can also be coordinated on a large scale as a result of a change in the underlying protocol. Ethereum, mentioned later, is in the process of major software upgrades in 2019 and has routinely created large scale forks that are decided on by the voting organization. A block is added to the blockchain after a majority of computers on the network reach consensus regarding the validity of the block. Sometimes multiple blocks are received simulta- neously that might be out of order, yet valid. This is possible due to the fact that blocks take time to propagate across the worldwide peer-to-peer network. If two blocks are mined at nearly the same time and are vast propagation distances apart in the network, many of the peers will accept one block and many will accept the other block. The blockchain now has multiple states. How is this resolved? Multiple forks are kept by nodes; they both continue grow in length until one chain wins by breaking the tie and becoming the longer fork. It is natural to assume that the correct and honest blocks will grow faster because there is more computation naturally devoted to the correct ledger. As a matter of fact, multiple forks can be correct and honest, but the longer chain by default has the most verified transactions. If a malicious node works to create an alternate chain in her favor (e.g., attempting to spend the same money twice), she will not succeed unless she has most of the processing power, also known as the 51% attack. 3.2.5 51% Attack

The default behavior of the protocol is to extend the longest chain. To successfully change a previously validated transaction, a malicious peer would have to re-mine the block containing that transaction - and all blocks that come after that block. If the attacker can extend her chain quicker (to outpace the growth of the main chain) the network will follow

38 her chain and she would be able to re-spend her money; which should not be possible. This is known as the 51% attack. To accomplish this, she must have control of over half the computation power among the participating nodes in the network and start a new fork (branch). The 51% attack is easier to achieve in shorter blockchains. Creating a malicious fork gets harder as the blockchain grows. It becomes computationally infeasible to change unless the whole network decides to do the same thing at once. 3.3 Ethereum Blockchain

We use the Ethereum2 blockchain because it was the first mover in the smart contract space and has the most robust smart contract programming language, Solidity. Developers can write programs that create/compute any application/computation using modern programming languages [7]. Payment in the form of ‘gas’ is required for transactions that change the blockchain. A smart contract is a programmed version of a normal contract between two or more entities. Smart contracts facilitate the performance of a contract without using a trusted third party as an intermediary. In Ethereum, there are costs to store data, change data, and deploy smart contracts. The digital currency of Ethereum is called ‘ether’ or ‘Eth’. Ether is the name of the largest denom- ination of currency in the Ethereum blockchain. In practice, a much smaller denomination, or fraction, called ‘wei’ (we or way) or ‘gwei’ (giga-we) is used. Smart contracts require fuel (i.e., gas) to run on each client. The fuel requirement exists both to fund the operation of the network and to prevent runaway smart contracts with infinite loops. Before a smart contract runs, it must be compiled into bytecode using a compiler for the smart contract language. Bytecode consists of individual opcodes that look similar to assembly language code. Smart contract code is converted into these low level opcodes by the compiler and then executed on the Ethereum Virtual Machine (EVM). Each opcode requires a certain amount of gas to execute on the public blockchain. The total cost of a single

2 https://www.ethereum.org

39 transaction is based on two variables: gasUsed and gasPrice. gasUsed is an estimate of the total amount of gas the transaction will require (e.g., 50,000 gas). gasPrice is the offered amount of ether per unit of gas specified as part of each transaction. These are multiplied together to get the cost of the transaction in ether. An Ethereum address looks like: 0xf0a6ab2e460236cf5cd6f89fc64d5c22d9645d7e. It is the last 20 bytes of the keccak256 of the public key of an Ethereum account. They are often preceded by a ‘0x’ to denote they are hex-encoded [32]. Every full peer on the network has a copy of the entire blockchain3 and can run the protocol to verify transactions (if they choose to mine and earn digital currency). 3.3.1 Smart Contracts

Smart contracts are programs that electronically manage an agreement between two or more parties. They were first proposed in 1996 by and brought into the mainstream by Ethereum. Ideally, they are used as an intermediary between two parties, but without a TTP. The smart contract is a normal contract that is autonomous and implemented via code. The code is triggered by blockchain transactions to write to and read from the blockchain’s distributed database based on the contract code. These transactions can contain additional inputs (e.g., new contract code being deployed). In Ethereum, once deployed, smart contracts have public addresses, just like users. 3.3.2 Paying for Space and Computation

The amount of ether per transaction is determined by both the amount of gas (usually the gas limit provided by the compiler) and the bid price. It costs effort for miners to incorporate the transactions. This effort is expressed in terms of gas. Smart contract code is converted into bytecode (low level opcodes) by the compiler . Each opcode requires a certain amount of gas to execute on the public blockchain.

3 Recently new services have begun to eliminate this requirement including https://infura.io/

40 Table 3-1. Ether denominations Unit Wei value Wei wei 1 wei 1 Kwei (babbage) 1e3 wei 1,000 Mwei (lovelace) 1e6 wei 1,000,000 Gwei (shannon) 1e9 wei 1,000,000,000 microether (szabo) 1e12 wei 1,000,000,000,000 milliether (finney) 1e15 wei 1,000,000,000,000,000 ether 1e18 wei 1,000,000,000,000,000,000

Ether is the most well-known denomination; however, the wei is what is used in trans- actions so that fractional amounts of ether can be expressed as integers. The wei is the base denomination, ether is largest with several gradations in between as shown in Table 3-1 [6].

One gigawei (gwei) is simply 1,000,000,000 wei and 1 ether = 1018 wei. We will mention briefly here and again later that using smart contracts to merely read from the blockchain is termed a ‘call’. It is a free operation that only needs to be performed on the local EVM. We took advantage of this functionality in our architecture using the web3 object. There are costs to insert attestations, deploy smart contracts, or change policy data. Executing a smart contract where no data are manipulated (e.g., checking if cloud A is trusted by a consumer does not cost anything if the user has access to the full blockchain. The ‘call’ is an overloaded term. A ‘call’ can also refer to one smart contract sending a message to another. Deploying a contract incurs a one-time cost based on the size of contract code. For example, all transactions have a base gas fee of 21,000 gas [121]. The gas amounts sound high, but the translation to digital currency is magnitudes less. More detailed gas rates for smart contract execution and data storage are listed in the Ethereum Yellow Paper [121]. The price one pays in Eth is the totalCost. totalCost is calculated as shown in Equation 3–1. totalCost = gasUsed ∗ gasPrice (3–1)

• gasUsed: (fixed per opcode) Measure of computation. A compilation time estimate of the total gas to be consumed by sending a transaction (e.g., deploying a smart contract, sending a transaction to a function within a smart contract) • gasPrice: (variable per transaction, specified in gwei) this is a bid the sender supplies with each transaction.

41 When the smart contract code is compiled, a gas estimate is provided. This estimate is stored in the gasUsed variable. This estimate is an aggregate total of all the opcodes plus the base cost of a transaction. Each computation within the EVM has its own opcode and each opcode has its own fixed gas cost. However, the total gasUsed by a contract is variable based on size of the smart contract and the opcodes the smart contract uses. The gasUsed amount is frequently inserted by the wallet application as the gas limit just before submitting a transaction. The gasPrice is a bid placed by the sender of the transaction. The Ethereum philosophy is to let market forces drive the overall economics of running the blockchain. This affects the speed of accepting transactions. The higher the bid, the more rapidly the transaction is accepted into the blockchain. The lower the bid, the slower the transaction is accepted. If the bid is too low, the transaction might not get accepted at all. However, there is always a ‘going rate’ called the average bid price that is publicly listed. 3.3.2.1 Example 1

The amount of gas required to store one word (256 bits of data) is 20,000 gas. The price per gas that is offered in a transaction is the choice of the sender. For simplicity, we are just considering the cost to store one word of data and assume the request is accepted by the blockchain and does not include ancillary costs (e.g., the cost of the transaction itself). For this example, we used an ETH/USD exchange rate at $246 per ether [5] the rate on May 23, 2019. The ultimate fiat currency cost of transactions rises and falls with the exchange rate.

totalCost = gasUsed ∗ gasPrice (3–2) gasUsed = 20, 000 (3–3) gasPrice = 4gwei = .000000004ether (3–4) totalCost = 20000 ∗ .000000004 = .00008eth (3–5) ≈ $.02 (3–6)

42 3.3.2.2 Example 2

The Ethereum opcode for the SHA3 operation consumes 30 gas. The charge for running that opcode is calculated using the same formula at the same gasPrice of 4 gwei and an exchange rate of $246 per ether.

totalCost = gasUsed ∗ gasPrice (3–7) gasUsed = 30 (3–8) gasPrice = 4gwei = .000000004ether (3–9) totalCost = 30 ∗ .000000004 = .00000012eth (3–10) ≈ $.00003 (3–11) 3.3.3 Transactions

Transactions are the main vehicle for doing most tasks in Ethereum (e.g., sending ether to another account, deploying a contract, executing a contract, or storing data on the blockchain). The Ethereum documentation explains what a transaction includes [9]:

• Recipient of the message, • A signature identifying the sender, • VALUE field: The amount of wei to transfer from the sender to the recipient, • An optional data field, which can contain the message sent to a contract, • STARTGAS value: (Sometimes call gas limit) the maximum amount of computation the transaction execution is allowed to take, and • GASPRICE value: the fee the sender is willing to pay for gas. We previously mentioned gasUsed and gasPrice as the two factors that determine cost of a transaction. Within the transaction itself, these are referred to as STARTGAS and GASPRICE. The GASPRICE is a bid that the sender is willing to pay for the price of gas to execute their transaction. Miners are not required to accept transactions. It is not possible to bid zero as a GASPRICE; however, the GASPRICE required for a transaction to be accepted is usually extremely small - but it cannot be so small that no miner will process it. Before a transaction is accepted into the blockchain, a miner must agree to process it. The STARTGAS field is the same as gas used which prevents infinite loops from wasting gas; the STARTGAS is the maximum amount of gas (i.e., computation) per transaction. It is determined and provided by the compiler but can be adjusted based on needs [9].

43 3.3.4 Data on the Ethereum Blockchain

Once a user creates a smart contract, where on the blockchain are the data stored? The storage process is transparent to blockchain users. In Ethereum, the data are distributed across blocks as the consumer’s policy ages. To visualize how this works across blocks in the blockchain, see Figure 7-4 [8, 121].

Figure 3-4. Code and storage references Data are stored under the storageRoot (4th white rectangle in the gray area). The storageRoot is a hash of the root node of a Merkle Patricia trie that contains the storage of the account [121]. Every block header in an Ethereum block contains three Merkle trie roots. The three tries are: 1) the state trie, 2) the transaction trie, and 3) the receipt trie. Ethereum state is a key-value map; the state trie contains a mapping between addresses and account information. The state trie is where the smart contract code and consumer data are stored. The storage root is for consumer policy data and the black box represents the location for smart contract code. Note the links between older blocks and new blocks. The links reference

44 previous contract state that has not been changed between blocks. This saves space instead of replicating data [8]. The remaining tries in each block header are for additional accounting; the transaction trie for each block contains hashes of all the transactions, and the receipt trie contains hashes of all the receipts of each transaction. 3.3.5 Proof of Stake

PoW is elegant in its simplicity, but the biggest problem with PoW is the energy expended in its pursuit. Furthermore, the introduction of application-specific chips (ASICs) has made typical PoW blockchains more vulnerable to the 51% attack due to central- ization of power. Proof of Stake (PoS) uses the economic stake of Ethereum users known as ‘validators’ to validate blocks. For Ethereum PoS, there will be a special smart contract (called Casper) that accepts deposits from validators. Validators deposit ether into the Casper smart contract; at present, 32 ETH will be required to stake.4 That is a significant amount of ether at today’s exchange rates. However, it is foreseeable that “staking pools” might appear (allowing less ether to participate) as the new protocol expands. Broadly speaking, in the algorithm, the validators take turns proposing and voting on the next block. The weight of the vote depends on the size of the deposit or ‘stake’. The idea of the deposit is to enable a punishment mechanism for bad behavior (keep their deposit). A validator can lose their stake if they act in a manner that subverts the voting process or if the block they voted on is rejected by the majority. The most important change is the punishment mechanism; in PoW you lose energy and effort to mining, in PoS you lose ether you currently own. Initially, Ethereum was planning a hybrid approach; an algorithm that would run PoS and PoW simultaneously. But the research team on the PoS transition has decided on a pure PoS algorithm. As a result, both the major flavors. “The Friendly Ghost”, and “The Friendly

4 https://docs.ethhub.io/ethereum-roadmap/ethereum-2.0/proof-of-stake/

45 Finality Gadget” are being worked on in parallel [32]. The technical details are beyond the scope of this dissertation. For further information, reference [39, 124]. 3.4 Multichain

Multichain [61] is a private blockchain; a private blockchain is limited to an organiza- tion. We will discuss key differences between public and private blockchains near the end of this chapter. Multichain’s goal is to improve upon the limitations of Bitcoin and make the blockchain more useful to the financial sector by providing security and privacy. Multichain targets enterprises, where some control over mining is desired. Bitcoin has been slow to become adopted for that which it was intended - financial instruments. According to Multichain’s founder, the reasons include, but are not limited to the difficulty in purchasing , questions over Bitcoin’s legal status, and the lack of mainstream support [61]. Since it is geared towards Bitcoin’s target market, Multichain is built (forked) off the Bitcoin core, which is the reference client used by Bitcoin nodes. Multichain uses most of Bitcoin’s protocol and architecture except for the process by which new peers are admitted. Because Multichain is a private blockchain, mining is restricted to identified entities. In many private blockchains, one person could monopolize the mining process. In a public blockchain, this is analogous to the 51% attack. To prevent this, Multichain uses a parameter called “mining diversity” where 0 ≤ miningdiversity ≤ 1. Multichain uses a round-robin verification scheme: 1. Apply all the permissions changes defined by transactions in the block in order.

2. Count the number of permitted miners who are defined after applying those changes.

3. Multiply miners by mining diversity variable, rounding up to calculate spacing.

4. If the miner of this block mined one of the previous spacing − 1 blocks, the block is invalid.

46 The closer the miningdiversity variable is to 1, the more different miners are included in the rotation. If the value is too close to 1, the blockchain can freeze if some miners become inactive. Multichain leaves ‘proof-of-work’ a configurable item. Normally, transactions and block rewards are near zero. The only benefit the miners get is the smooth functioning of the blockchain. However, if it is desired, Multichain can be configured to use a currency for block rewards. Multichain has support for multiple blockchains. The benefit for the financial sector is that they can be used by internal system administrators; not just developers. 3.5 Systems Based on Blockchains

We investigated current systems in active research within the last few years that are based on different blockchains. 3.5.1 Storj, 2014

Storj offers a trustless solution for cloud storage that obviates the need to trust a single entity [119]. Storj encrypt parts (shards) of data on individual computers who participate in the Storj peer-to-peer (P2P) network. The solution offers an alternative to alternative cloud storage mechanism implementing end-to-end encryption that uses blockchains to store the data location, and file meta data. The motivation of this system is the fact that cloud data storage has come to rely almost exclusively on big cloud providers (i.e. Amazon, Google, Microsoft, etc). Consumers must implicitly trust these providers when uploading data. Each client in the P2P network stores an encrypted piece of a file called a shard. The file is separated into shards so that no one peer has the whole file. To provide proof of storage (PoS), clients on this network generate a series of seeds. Seeds are added to the files and hashed which generates a unique reply to answer future PoS challenges. Sharding can allow for redundancy in the case of loss if the shards are duplicated. Figure 3-5 [119] illustrates hashing of shards: Storj network must be able to achieve consensus on the file location and integrity. For this, it uses a blockchain. The blockchain contains the file hash, locations of shards, and

47 Figure 3-5. Storj sharding

the Merkle roots; however not the consumer’s files. The Merkle roots provide proof of storage based on the previously discussed a Merkle tree. Storj uses its own crypto currency. Storj contributes to blockchain technology and provides an alternative to the major market players. Current pricing puts Storj at $.015 GB/Month, where Amazon and Microsoft are $.023 and $.030 respectively [119]. This makes Storj competitive. 3.5.2 MedRec, 2016

Medrec uses blockchains as the basis of for a decentralized medical records management system [34]. Smart contracts manage permissions and access rights of patient medical data. Medrec associates one smart contract per patient and each patient-provider relationship has its own contract. They have three types of contracts: 1) Registrar Contract (RC) maps names to Ethereum addresses, 2) Patient Provider Relationship (PPR) is the relationship between nodes, 3) Summary Contract (SC) keeps track of Patient Provider contracts. These are illustrated in the architecture Figure 3-6 [34]. The health care providers manage patient data locally on their own servers. Patient nodes have the same basic components as providers, but patient nodes can be executed on a local PC or . Each node has a backend library for various system operations such as a database gatekeeper to manage access to the database. The typical use case sees the patient’s record stored in the provider’s existing database. A hashed reference to that data (plus viewing permissions) is posted to the blockchain via the Ethereum client. The patient retrieves this data from the provider’s database, after the

48 Figure 3-6. Medrec architecture

Figure 3-7. Medrec process flow

gatekeeper verifies with the blockchain (via the smart contract) access and ownership rights. The overall process is illustrated in Figure 3-7 [34]. Since transactions require funding, care providers can mine or purchase ether to fund transactions and posting of information. Medical

49 researchers might be incentivized to participate if they can use anonymized data for research purposes. MedRec does not store patient records per se. The blockchain itself contains metadata; a hashed reference to the data on a provider’s local database. The blockchain is used to confirm ownership rights and provides a means to validate integrity. 3.5.3 Blockstack, 2017

Blockstack is an alternative Internet/web currently in production [29]. Blockstack’s mission is to create a secure alternative to the Internet; most security professionals concur that the Internet was not built with security as a priority. Blockstack is a decentralized Internet using blockchains for added security. It does not replace network hardware nor routing currently in place. It provides more secure Internet applications: Domain Name Service (DNS), discovery, and decentralized storage. Traditional Internet has many centralized points of failure (e.g. DNS, PKI, user data on centralized cloud servers). Blockstack: 1) is blockchain agnostic; running on any chain - even multiple, 2) keeps control logic and data plane separate, and 3) does not use a blockchain for storage (to assist scalability). Their primary innovation is what they term the “virtualchain”. Virtualchain allows Blockstack to switch the underlying blockchain in addition to handling forks (blockchain splits). They also introduce a scalable index for global data. Blockstack has three main parts: 1) DNS implemented with virtualchains 2) a peer network for discovery, 3) decentralized storage without using trusted parties. Blockstack has three architectural layers: 1) blockchain layer (control plane), 2) peer network (data plane), and 3) data storage (data plane). The blockchain layer provides consensus on the order of the operations. The architecture is shown in Figure 3-8 [29]. The Blockchain Naming System (BNS) is a decentralized replacement for DNS (however with no centralized points of failure or control). Atlas is used for discovery and is a P2P network that solves the problem of scalability with the blockchain. The blockchain does not store the data, merely pointers to the data. Atlas stores the zone files for BNS - similar to DNS zone files.

50 Figure 3-8. Blockstack architecture

Virtualchain enables Blockstack to migrate between blockchains for flexibility and in the event of blockchain failure. Virtualchains are like virtual machines. Different types of virtualchains can run on top of the blockchain. BNS was implemented by defining operations in a new virtualchain. Virtualchains are used to create arbitrary state machines on top of blockchains and encode operations in transactions on the blockchain. Nodes of the blockchain are unaware of this functionality. Gaia is Blockstack’s decentralized storage solution, however it still uses cloud providers to store encrypted and/or signed data blobs. Identities are concealed through a decentralized blockchain layer so the cloud does not have visibility into the owner of these data blobs. Blockstack contributes to the trust and blockchain spaces by creating a software-only solution to fix problems with the security of the existing applications that make up the current Internet.

51 Blockstack’s goal is to give comparable performance to existing cloud providers with additional security. 3.5.4 Systems Summary

We selected a diverse array of blockchain systems to demonstrate the wide applicability of blockchains. Storj encrypts shards of data on individual computers who participate in the Storj P2P network. The solution provides an alternative to using single large corporations such as Google or Amazon. Medrec uses blockchains as the basis of a decentralized medical records management system. Smart contracts manage permissions and access rights of patient medical data. Medrec associates one smart contract per patient and each patient-provider relationship has its own contract. Some of the key details specific to Ethereum transactions (e.g., like gas and bid price) are not discussed. Blockstack is an alternative Internet/web. Blockstack provides a decentralized Internet using blockchains for added security. Blockstack does not replace existing network hardware nor routing currently in place. It is blockchain agnostic (i.e., it can use any blockchain). They assert a weakness of blockchains is that they might go offline or suffer failure in the consensus mechanism. We, however, assert that the blockchain is actually less susceptible to failure since it is built on a decentralized P2P network. 3.6 Chapter Summary

A public blockchain is available to anyone on the Internet and is fully decentralized. A private blockchain is limited in size and scope and can limit transactions to only those that are of interest to those participating in the network. In addition, the identities of its participants are known. Even though the Ethereum public blockchain size is approximately 50 to 100+ giga-bytes (depending on what kind of node you are running), there is no single point of failure in the public blockchain. The underlying protocol of the public blockchain is controlled by a large organization that votes on upgrades. In a private blockchain, there is a manager who controls admittance and decides who performs mining. This is a source of single point of

52 failure. Some projects (e.g., Medicalchain, Datawallet, Kochava) [56] are building their own private blockchains.

Table 3-2. Blockchains public/private compared Feature Private Public Admission Model Private (transaction censorship easier) None Consensus Group of Identified/Assigned Miners Proof of Work necessary Primary blockchain None X Peer to peer X X Fault tolerance X X Public key crypto X X Transaction constraints X X Consensus by chain of blocks X X Currently, in our view, a privatized blockchain eliminates most of the gains achieved by being on a public network: the integrity of the data are not as high as the public chain, the chance for collusion between peers is higher, there is a central point of failure, and a private blockchain does not work between different cloud organizations. We have discussed Bitcoin because many of the fundamentals of blockchains are leveraged with them. Multichain is a popular open blockchain system that can be used to create private blockchains. Some do not like smart contracts because smart contract code can be error prone. We like blockchains for their high integrity and smart contracts for their ability to functionally evaluate consumer policies.

53 CHAPTER 4 SYSTEMS REVIEW Our work falls on the boundary between the cloud trust and data movement research spaces. We identify the pros, cons, and the gaps that our research fills. Because these systems are approaches to security and trust solutions, it is important to be familiar with basic computer security terms. Confidentiality, integrity, and availability (CIA) are known as the computer security triad. These are generally the first and fundamental precepts in any introduction to computer security. Auditing and accounting (or accountability) are just as important and frequently follow the CIA triad. The trusted computing base (TCB) is an essential term in computer security and men- tioned frequently throughout this dissertation. The TCB is the set of all hardware, software, and firmware that is critical to maintaining security and security policies of a computer system [63]. 4.1 Trust Systems

Many existing cloud trust systems use reputation management schemes and virtual machine security. We review existing systems in the trust research space and note the achieve- ments and limitations. 4.1.1 Excalibur: Policy Sealed Data, 2012

Excalibur is one of the first works to use the TPM for policy sealing [101]. Its design goal is to implement a system that offers policy sealed data using commodity Trusted Platform Modules. The focus is on the design of the seal/unseal primitives. The cloud platforms must protect the key material used to seal and unseal. To address these additional goals, the system must be hardened. Excalibur assumes the management interface is via remote access only, the threat model allows for remote malicious administrator with no physical access and allows the administrator

54 Figure 4-1. Excalibur architecture to reboot any node and install malicious software, and lastly side channel attacks are not considered. Seal can be invoked either on the consumer computer or on a cloud node. It takes the consumer data and a policy and outputs ciphertext. Unseal is only invoked on cloud nodes. It takes the sealed data and decrypts it if and only if the nodes’ configuration satisfies the policy specified upon seal. Excalibur is designed based on a central monitor. The monitor coordinates the enforce- ment of policy data across the entire cloud infrastructure. Figure 4-1 [101] highlights the overall deployment. The monitor is running on a dedicated cloud node (or small set of nodes). Each cloud node has set of human readable attributes (i.e., vmm, version, location, country). Whenever a cloud node reboots, the monitor runs a remote attestation protocol to obtain identity information, which is then converted to a node configuration by consulting an internal database. The node configuration is then encoded and sent back as credentials to the node. Trust of the monitor is relied upon by certificates. Since anyone can issue certificates, the monitor passes the identity of the certifier onto the consumers. The location of the certifier is not discussed. Two challenges are addressed: 1) cryptographic scalable policy enforcement, and 2) monitor managed by untrusted cloud administrators.

55 1) Scalable cryptographic enforcement. If the monitor handles all the sealing and unseal- ing, the monitor becomes a bottleneck. If the client handles it, there will be too many keys due to the number of key pairs the node needs to handle. Excalibur uses ciphertext policy attribute based encryption (CPABE). The scheme generates keys. Polices are attached to the master key and multiple decryption keys are generated as necessary. This way the fingerprint of the cloud nodes are checked, then a decryption key is sent to the node. The security of the system depends on the security of the CPABE keys. There are a variety of techniques to protect the CPABE keys (e.g., encrypt before storage, good security practices). 2) Monitor attestation mechanisms. The monitor is managed by cloud administrator and is also at risk. The monitor attribute mappings should be vouched for by a certificate to prevent mis-management. Excalibur has two main innovations. First, policy sealed data allows consumer data to be encrypted according to consumer policy. Only nodes (i.e., systems) that satisfy the policy can decrypt and retrieve data. This allows multiple nodes to access data as long as the node follows the policy. Second, the implementation scales better to cloud services by using a centralized monitor to give single point of contact. However it only works within the confines of a single cloud organization. The use of the CPABE encryption is quite novel for the time. 4.1.2 CloudMonatt, 2015

CloudMonatt is an architecture that allows monitoring of virtual machine (VM) health in the cloud [125]. It is for both monitoring and attestation of a VM’s security properties. Property-based attestation is an idea introduced in the 2004 New Security Paradigms Work- shop [98]. Property-based attestation focuses not on the software or hardware configurations, but only the “properties” that the platform offers. A property is a “behavior of the platform” with respect to a certain requirement. An open problem in property based attestation is how to transform properties, functions, and behaviors into security measurements. The interpretation of properties can be challenging. The idea is to monitor if certain security properties are being violated or enforced. In this architecture, the cloud provider (in addition to the architecture

56 Figure 4-2. CloudMonatt architecture components) is assumed to be trusted; the main attack vector is from malicious VMs. The CloudMonatt framework allows for: 1) interpretation of collected security relevant data, 2) run-time attestations (not just boot and launch), 3) remediation response strategies. This work focuses on property-based attestation. Shown in Figure 4-2 [125], the CloudMonatt architecture has four players: 1) cloud consumer, 2) cloud controller, 3) attestation server, and 4) cloud server. The deployment functionality of VMs and the attesting of VMs are kept separate to be consistent with the ‘separation of duties’ security philosophy. The cloud controller creates VMs for consumers. The policy module ensures that the nodes selected for the VMs satisfy security requirements. The deployment module executes the VM. For attestation purposes, the cloud controller entrusts the attestation server to both collect and judge security properties associated with VMs. The attestation server has modules that validate measurements, but it depends on an external certificate authority. The consumer may want to know about several security properties. The CloudMonatt attestation server maps a security property P to measurements M. This gives a list of measurements that can indicate the security health with respect to the specified property P. The cloud servers assume type-1 (a bare metal implementation). Each server has a monitor module and trust module. The monitor module contains security measure- ments. It can be hardware-based (e.g., like the TPM) or software based (e.g., like performance

57 counters). The trust module accomplishes server using cryptographic measure- ments. The research demonstrated the ability to detect a covert channel by creating a CPU-based covert channel. The idea is to have the sender VM occupy the CPU for different amounts of time. Similar to cache timing, long CPU usage indicates a “1” while short indicates a “0”. The monitor module can be configured to collect the CPU usage information. It might be possible to use different machine learning techniques to analyze these types of attacks over the long term. 4.1.3 Verifiable Confidential Cloud Computing, 2015

Verifiable confidential cloud computing (VC3) is designed specifically for Hadoop MapRe- duce, a “big data” processing software system [104]. The focus of this research is to maintain confidentiality of computations in distributed MapReduce applications. The system leverages the protected memory enclave provided by SGX with some addi- tions. The system is partitioned so that the Hadoop framework, the OS, and are all outside the trusted computing base (TCB). Recall that the TCB is considered to be all the hardware and software that fulfills overall security and security policies on a system. VC3 pro- vides security protocols for cloud attestation and running the jobs in isolated memory regions. Denial of service, side-channels, fault injections, or corrupted Hadoop scheduling cannot be prevented. The adversary model includes a malicious insider who controls the hypervisor and OS, but who is unable to access the SGX enabled processor package. When MapReduce code is ready for production, the developer compiles and encrypts it resulting in private enclave code E −. This result is bound with public code E + that implements VC3 protocols. This is all uploaded to the cloud in addition to encrypted data files. In the cloud, the enclaves are initialized with enclave code E − and E +. Within VC3, there is a key exchange with the code E +, running in the secure enclave. After the key exchange with the user, E + is ready to decrypt private code E − and process the encrypted data. The important MapReduce code is kept inside the TCB (i.e., the SGX secure area).

58 The interface between E + and the outside world is narrow; meaning there are only two functions to read and write key-value pairs to and from Hadoop. There is a shared memory region outside the enclave used to pass messages for communication, but there is no other dependency on the OS. Even though the enclave process space has limited access from the host, code in the protected enclave can access the entire address space of its host process. This allows for efficient communication, but cloud also potentially broaden the attack surface of the enclave. If enclave code were ever to dereference a corrupted pointer outside the enclave, the effects could be unpredictable. The secure enclave is created to ensure that all execution of enclave code is single threaded. The enclave has its own stack and heap. Parallelism is achieved by duplicating this process in separate parallel enclaves. The code in the enclave is assumed to be trustworthy and correct, but there is always the potential for bugs. To address this problem, a Microsoft C++ compiler was modified to provide for enclave self-integrity invariants. The additional compiler checks are written in assembly and checks, at compile time, that memory accessed by the code is within the enclave. By using the SGX, they can guarantee the confidentiality and integrity for both code and data while inside the protected enclave. In this research, the definition of the TCB is decreased to encompass a much smaller secure footprint. 4.1.4 Trustworthy Multi-Cloud Services Communities, 2015

In this research, services are used to rate other services’ trustworthiness using quality of service as the basis of trust [115]. Organizing web services into communities is an approach to address challenges associated with the proliferation of cloud computing. The game theoretic model analyzes interactions among players in a game and how the players form coalitions. The objective is to form trusted communities and minimize the number of malicious members. Trust is built between services by collecting judgments from neighboring services. The model of the research is captured in what they term the DEBT (Discovery, Establishment, and Bootstrapping Trust) framework. They assert the framework is resilient to collusion among

59 services to prevent misleading trust evaluations. The DEBT framework is shown in Figure 4-3 [115].

Figure 4-3. Debt framework The first step in the framework is trust bootstrapping; assessing initial trust for newly deployed services. The bootstrapping process assumes that new services (and initial trust val- ues) are spawned from existing services. Thus, during bootstrapping, the service accomplishing bootstrapping uses its own datasets and a decision tree to estimate and assign initial values for trust. Each service maintains a trust dataset that records its previous interactions with several

services. For example, if a request from service Si is made to bootstrap a new service Sj , the

initial trust values for the new service are derived from Si datasets. The second step involves discovery of services. The authors propose an algorithm that allows services to discover their neighbors. The algorithm uses a breadth-first search strategy to locate neighbors and leverages the concept of tagging (e.g., Facebook, LinkedIn) to identify neighbors. Since services use their own tag structure to store neighbor information, there is a storage requirement. Each service has its own tag and each tag could be the size of the total number of services. The last and most rigorous step is trust establishment; trust is established by collecting and aggregating judgments. Services judge each other to determine which other services are

60 trustworthy. For example, Figure 4-4 [115] shows a snippet of a social network. Each edge between Service i (Si ) and Service j (Sj ) indicates an interaction between the two services.

Figure 4-4. Social network graph: vertices are services, edges are interactions

The judgment pair (S1 → S2: T ; S2 → S1 : M) between services S1 and S2 in the

figure indicates that S1 rates S2 as trustworthy, whereas S2 rates S1 as malicious. The edge represents a judgment of each service based on a previous interaction. If the interactions are good based on the standards imposed by the governing SLA (and a pre-determined threshold),

then Si rates Sj as trustworthy. This step also uses an aggregation technique based on Dempster-Shafer theory (belief theory) to overcome colluding services. Services may be either truthful or collusive while

judging each other. Each pair of services (Si , Sj ) has a belief in credibility value Cr, shown as .6 and .3 in Figure 4-4. These values represent each service’s basic probability assignment or belief accuracy in judging the other services. How these values are determined within this framework is not explained. A formal proof is provided to support the claim of resiliency to collusion, however, the proof is not clear.

61 Figure 4-5. CTP example use case

4.1.5 Cloud Trust Protocol, 2015

The Cloud Trust Protocol (CTP) is supported by the Cloud Security Alliance and provides for an interface to the cloud that consumers can query for information and that allows the cloud to respond in a consistent manner [3]. What CTP recognizes is that CSPs offer different services using different proprietary protocols. Therefore, CTP provides/encourages a medium for both clouds and consumers to communicate. There is a wide range of use cases and as such CTP provides a generalized model that all parties can interpret. Figure 4-5 [3] is an example that shows a typical CTP exchange between a consumer and a cloud. Note that CTP does not monitor each cloud infrastructure, but it offers consistency to the reporting of services among diverse cloud providers so consumers can interpret service levels. 4.1.6 Cloud Armor, 2016

Cloud Armor is a reputation-based trust management service (TMS) framework and project [87]. The focus of the research is the credibility of trust feedback from consumers while preserving consumers’ privacy, and the introduction of a zero-knowledge protocol. The architecture is shown in Figure 4-6 [87] and includes a cloud provider layer, TMS layer (distributed nodes in multiple cloud environments), an identity management service (IdM), and a consumer layer (where consumers reside). The TMS includes a credibility model with metrics to detect collusion attacks and malicious users, and is decentralized to provide adequate availability of the TMS. Trust feedback to a consumer is provided by the TMS using calculated quality of service feedbacks from multiple consumers. Each feedback has a user identity, a cloud identity, a feedback rating (0, 1, .5 - meaning negative, positive, neutral

62 Figure 4-6. Cloud Armor architecture respectively), and a timestamp. Trust is aggregated amongst all consumers using a credibility weighting. The research targets collusion (collusive malicious feedback) and Sybil attacks (misleading feedbacks from multiple identities). When using the system, the user must register with a separate IdM and must establish their identities before using TMS. To prevent collusion, feedback density is used that assumes that the greater the volume of feedbacks from one user, the greater the chances that the user might be malicious. The research also considers time as a factor and discuss availability of system nodes including power requirements, data replication, and node availability. Since detecting malicious reputation feedback while preserving privacy is germane to the subject, we look deeper at the proposed IdM, which preserves consumer privacy. The IdM stores a record for each user represented by a tuple:

I = (C, Ca, Ti ) (4–1)

63 where C = user’s primary identity (e.g., user name). Ca = represents a set of what they term credentials’ attributes (e.g., , postal address, and IP address), and Ti = the user’s registration time in TMS. A zero-knowledge credibility proof protocol (ZKC2P) preserves the consumers’ privacy and enables the TMS to prove the credibility of a consumer’s feedback without revealing identity. When the TMS gets the information, the TMS processes credentials without including the sensitive information (e.g., date of birth or address). The credentials are stored in TMS with sensitive information anonymized using hashing. Specific details of ZKC2P beyond the use of hashing techniques are not included. 4.1.7 Tenant Attested Trusted Cloud, 2016

Tenant attested trusted cloud service is a system framework for two-phase trust (i.e. tenant attestation and trust management) on remote virtual machines [93]. It provides a consumer the ability to attest to dom0 in the virtual machine monitor. Dom0 is a privileged administrative domain that starts first to manage the rest of the guest VMs. The research assumes the cloud servers have TPMs to assist in building the trusted computing base and that the virtual TPM (vTPM) is being used. vTPM can extend TPM capabilities into the virtual machine and was written for the Xen hypervisor. Bugs in code, information leakage, and hardware side channel attacks are out of scope for this research. The goal of this research is to allow the tenants to attest their remote execution themselves, as well as configure and define access control policies. Furthermore, they call attention to the requirement for clouds to be able to trust its tenants. Figure 4-7 [93] shows both functional components and architecture. The system is broken down into four functional components (besides the cloud provider): certificate authority (CA) inside or associated with a TTP, the integrity configuration/attestation service (ICAS), the tenant, and the integrity verification and report service (IVRS) in the cloud. The CA and ICAS are separate entities from the target cloud. The ICAS interacts and communicates directly with the software that is located within the machine hosting the VM. The ICAS can be in the

64 A TA-TCS Functional B TA-TCS Architecture

Figure 4-7. Tenant-attested framework diagrams tenant’s private cloud or an independent TTP; the location of the ICAS is implementation dependent. In the architecture half of Figure 4-7, a minimal trusted environment (MTE) attests to the run-time state within the VMM. The MTE contains a trusted boot (tboot), and a dynamic root of trust based on the TPM for measurement (neither shown in figure). The TPM only measures the host upon boot time. The static root of trust for measurement (SRTM) that the TPM generates cannot be used when VMs are created long after boot time. The TPM is only at the host level while VMs are continuously starting, running, and being shut down. The TPM is ineffective at the granularity of the VM in the IaaS environment. The MTE fulfills this need. The ICAS provides an interface to the consumer. The ICAS acts as a proxy to allow consumers to configure VMs and get attestations. The ICAS manages all the correct measure- ments from the trusted VMs. The ICAS can be in the tenant’s private cloud or a TTP. The ICAS can also be used by the cloud administrators to verify specific VMs; the ICAS can act as an auditor on behalf of either the cloud or the consumers. The integrity verification and report service (IVRS) is in the cloud within dom0. It communicates with the MTE, opens connections to ICAS and reports if dom0 is in a trusted state. The design of the IVRS includes virtual machine introspection (VMI), platform trust service, and the vTPM manager to verify the VM image. IVRS decrypts the image and

65 recomputes the hash to verify its integrity. This assumes previous hashes are computed and stored as a verification or “gold” copy. The VMI allows security applications to view a VM’s raw memory state and verify the integrity of services at runtime. Verification of runtime services on VMs is novel and is accomplished using LibVMI and Volatility. LibVMI is an open-source implementation of VMI and provides a method to reading and writing a VMs memory. Windows and Linux maintain lists of active processes; the facilities are detailed in [93]. Volatility is an open source memory forensics tool used to dump process codes and list all executable entities in a specific VM to compute checksums in Linux. It is a detailed architecture to check the security of virtual machines in the cloud. Volatility dumps the static codes in memory, but not dynamic data. The prototype includes a simulated a TTP server for the ICAS subsystem and uses LibVMI and Volatility to read and parse the guest VMs memory. Measurements can be taken periodically. An algorithm is developed for process measurement and verification that checks static in-memory processes, not any dynamic data. Pre-defined whitelists must exist. The combination of LibVMI and Volatility to access the footprints of VM memory is novel. 4.1.8 Trust Systems Summary

Excalibur system relies solely on TPM and a monitor inside the cloud. The consumer must ensure the authenticity and integrity of the monitor; this is inadequately explained. While we agree that integrity of the platform can be confirmed at boot time, we assert that after boot time, vulnerabilities can still be exploited. The control is inside the cloud; we focus on a multi-cloud decentralized trust policy framework. CloudMonatt system has the shortcoming that most of the components within the architecture are considered inherently trusted (except for the cloud servers themselves). This limits the usefulness of a system. If you assume something is trusted, proving that the thing is ‘trusted’ becomes trivial. The contribution to the field of covert channel attack analysis is compelling. The monitor module performs collection of necessary CPU utilization information. This information is sent to the Attestation Server for real time analysis. No detail is provided

66 on the implementation on the monitor module. The system is claimed to be the first attempt and realization of property-based attestation for VM (versus binary hash value attestation). CloudMonatt uses a standard TPM-emulator to function as the trust module. VC3 is a good example of hardware-based trust. However, a theme we have noted in the trust research space (particularly VC3) is an attempt to minimize the footprint of the TCB thereby reducing the attack surface. While this is a good goal, instead of minimizing code, what often happens is that what constitutes the TCB is changed. We assert that redefining the TCB is not sufficient to improve overall security. Care must be taken when referring to the TCB. Trustworthy cloud communities address a niche area in cloud trust: how trust is de- termined among service communities. It also addresses some of the limitations of previous models. The attack model mentions the potential for collusion attacks and passive attacks. However, the attack vectors appear to be inherent to the model. We are also dubious about services rating themselves and other services for overall trustworthiness ratings. Social network tagging is mentioned in this research, but a practical example is not included. The authors refer to the use of tagging in both social networks and use tags as a mechanism to store neighbor information. The terminology is unclear. CloudArmor has some limitations: 1) unless the identity management service is a com- pletely different entity, raw registrations are still vulnerable, 2) malicious users will not vol- unteer valid identity information, and 3) the zero-knowledge credibility proof is unclear. The area, however, is very interesting as it focuses on malicious feedbacks and the use of hashing. In addition, the field of non-interactive zero-knowledge proofs focuses on being able to prove something without revealing secrets. This is also an interesting problem and could be a good direction for future research. We assert having a reputation processing/collection node in the cloud is a conflict of interest. Tenant attested cloud service assumed that the TCB included the TPM, bootloader and VMM, but not the dom0. The ICAS, which is a TTP, is also used by the administrators of

67 the cloud to check and verify specific VMs from an IaaS perspective. We are cautious about this design aspect as it muddies the issue of attestation, trust, and control. This approach is designed to protect the cloud from malicious tenants. How this works is not explained. To work in both directions, it needs to be completely un-biased. There is no mentioned, therefore the assumption is that this solution is mainly for bare-metal, but is never stated. Most researchers have noted that the TPM will only work on a system-by-system basis during boot stage. Whitelists of expected software hashes must be pre-defined for comparison. The vTPM extends the TPM into the virtual machine so that it allows for the same attesta- tions on a per VM basis. But they have the same drawbacks as the TPM. SGX has proposed a secure enclave technology that allows for secure computation. All these trust approaches have contributed to the trust system space. 4.2 Data Movement Systems

We see several opportunities to contribute to improving trust in the cloud. We now discuss systems in the cloud data movement research space. Many of the systems track the movement of data within the cloud. We wish to track the movement of data ‘between’ clouds. 4.2.1 CloudFence, 2013

CloudFence is a data flow tracking system for a single cloud that provides fine grained tracking capabilities using memory tagging [89]. CloudFence provides data tracking ‘as-a- service’ for users to audit their data and for providers to confine data movement. The research uses 32-bit wide tags per byte (allowing for 232 tags). They assert that sometimes there are third parties we implicitly trust when using the cloud. Service providers often use a third cloud provider onto which they host their application. Since users might not always be aware of third parties, users place inherent trust in all the players - while not necessarily being aware of all players. CloudFence wants to change this implicit trust and make it more of a direct relationship using data tracking as a mechanism. The CloudFence interactions are shown in Figure 4-8. CloudFence has three main components

68 Figure 4-8. CloudFence interactions

Figure 4-9. CloudFence architecture

in its architecture shown in Figure 4-9: 1) data flow tracking (DFT), 2) API stub, and 3) audit trails generation [89]. For data flow tracking, CloudFence uses Intel’s PIN and shadow memory. PIN is a dynamic binary instrumentation tool set for program analysis [97]. Shadow memory is a technique to map bytes in main memory (to hold tags) to other bytes or bits in another portion of memory. It can be implemented in a variety of ways. CloudFence uses PIN to analyze all instructions and move or combine data. The API contains C language functions that are used to associate tags to bytes and then manipulate those tags. Audit trails are stored outside the service provider in append-only fashion. Presumably this is done by hashing, but this is not explained further in the research.

69 Previous tagging approaches use 1) bit size tags (every byte uses a single bit in shadow memory), or 2) byte size tags (where each byte, has a sibling in shadow memory). Some systems use tags with arbitrary size with additional overhead. They opted for larger tags (four bytes instead of one) to get 232 tags and redesigned shadow memory at an additional overhead of tag combining operations. Using a 64-bit architecture, they split the address space into shadow and process memory; shadow memory is reserved as soon as the process is started. Translating the addresses uses an offset as follows. Given vaddr is virtual address, and taddr is the tag: taddr = (vaddr << 2) + to (4–2) CloudFence reserves 16TB user space for application space and 64TB for shadow memory. Because of this the memory overhead increases by a factor of 5. They built software to support tag propagation across sockets, pipes, files, and shared memory. CloudFence must maintain a global registry of sockets by domain to maintain consistency and tracking throughout the cloud network. Overall, CloudFence contributes a rigorous memory tagging scheme that operates at the byte level and supports tag combining, tag persistence, and tag analysis. 4.2.2 S2 Logger, 2013

The S2 Logger system provides for end-to-end data tracking to support cloud provenance [110, 112]. Block and file level data events are captured at the kernel level and kept on the management network. This is a Linux only system, designed for use within a single cloud. The system is mainly based on provenance tracking capabilities already present in Linux to capture kernel level events. Figure 4-10 [110] shows the architecture. S2 Logger captures data logs from the kernel spaces of both the virtual and host machines. The main components of the architecture are: 1) information capture (events captured from each machine’s kernel space), 2) log transfer/storage (data logs transmitted on the cloud management network, for VM they are transferred via the hypervisor), 3) analysis (once in the database, events can be correlated from across all hosts; this provides an end-to-end view).

70 Figure 4-10. S2Logger architecture

For event capture, the first method uses a loadable linux kernel module and a syscall table hooking method. A kernel system call exists for each user operation in the system call table. S2Logger implements a kernel module to hook functions that they needed and modifies the syscall table to point to a handler. However, kernel versions newer than 2.6.24 made the syscall table read-only. Getting around this requires a more invasive process such as kernel recompilation. The second method uses the Linux security modules (LSM) API. The LSM provides a structure of pre-defined function pointers whose functions can perform low level information capture at the file, process, and network level. S2 Logger uses two log transfer mechanisms: 1) synchronous, based on Standard Query Language (SQL), and 2) asynchronous, based on Advanced Message Queuing Protocol (AMQP), an industry standard messaging framework. SQL is a reliable and established method to store data. AMQP ensures reliable message exchange, storing the message queues on disk and memory. Message acknowledgments are included. AMPQ also provides for event notification, SQL does not.

71 Figure 4-11. DAG event graph

S2 Logger also use database manipulation that sequentially processes each data entry, creating nodes and edges in the process. Nodes can be processes, files, or sockets; and edges are actions. A directed graph is the result (sometimes cyclic due to reading/writing same file). Once logs are captured in the database, processes, files, and sockets can be distinguished. During this pre-processing parent-child links are determined until a rough DAG is generated. Graph traversal utilities are used to clean up the final graph on a per file basis. Figure 4-11 [110] shows a sample DAG. One the left side, the file patent.txt is accessed and tracked. Various events are identified until finally the file is written to a different filename on a different host as shown on the right side of the figure. By finding the relationships between actions, security managers can piece together a story. The contribution of S2 Logger is a robust provenance system using Linux kernel hooks to gather provenance from guest machines. To capture log data from the virtual machines, the architecture utilizes existing APIs for VMware

72 Figure 4-12. ECSM architecture hypervisors so that guests can communicate with the hosts using a virtual RS-232-like channel and avoid any network activity. 4.2.3 Data Location Control Model, 2014

A data location control model is proposed that gives users visibility into the location of their data [54]. The system uses XML based policies. The model assumes three actors: 1) cloud service providers, 2) enterprises that broker cloud services, and 3) end users who own/consume the data. The model requires identifiers for data and policies so that these can be linked in the cloud to determine which action is relevant for which data and for which user. The architecture is based on a centralized enterprise cloud service manager (ECSM). The ECSM manages not only user level data policies, but organizational administrative policies. As shown in Figure 4-12 [54], the ECSM maintains several tables. These tables and policies are consulted when data movement or data access is about to occur. There are two primary use cases: 1) intra-cloud service, and 2) inter-cloud service. To use the system for either purpose, the user registers directly with the ECSM to receive userID. Each user is associated with a dataID, , and preferred location policy and alternative locations. A policyID is returned and linked to the dataID. Each dataID is linked to a region table with location

73 Figure 4-13. ECSM architecture with two clouds

information. All requests for data access go through the ECSM and the policy tables are checked. Various conflict resolution policies are possible. Data movement, like data access, must be authorized through the ECSM. The intra-cloud services are handled using the mechanisms mentioned. The only difference with the inter-cloud services is that two ECSM managers from different clouds are involved and each ECSM maintains another table to track external clouds. These systems must communicate with each other so the organizations need mutual agreements. The modified Figure 4-13 [54] illustrates this. A graphical interface is provided to the consumer to express their location preferences. They use XML based policies that are generated automatically based on locations the users select while using the GUI. Default policies might be available to users. They implemented a prototype using MySQL and PHP (a scripting language for web development) which in turn uses Java as a web service. Validation consisted of correctly translating consumer requests into policy, storing them, performing data authorization and movement by the administrator, and performance measurements. This research is one of the few that addresses the possibility of data crossing organiza- tional boundaries, which is an emerging research space. But, since the ECSM resides in the

74 cloud, it is subject to conflict of interest issues. However, being in the cloud affords it more direct control over data. Policies are centrally stored within the cloud, therefore, when migrating across clouds, the policy store must be copied to the new organization. The majority of the system focuses on converting consumer preferences into XML based policies. The ECSM maintains policy linking tables. 4.2.4 Stratus Project, 2015

Stratus is a cyber security collaboration among researchers in New Zealand to further trust and control in the cloud [79]. Several position papers are connected to Stratus. Position papers simply state and explain ideas for further research. These papers include: “Returning Control of Data to Users with a Personal Information Crunch” [120], “TrustCloud: A Frame- work for Accountability and Trust in Cloud Computing” [77], and “Tracking of Data Leaving the Cloud” [112]. The position paper, “Returning Control of Data to Users with a Personal Information Crunch,” highlights the issue of the inability of users to control their personal data in the cloud [120]. It proposes a model where personal information is stored on the user’s mobile device and requested by vendors and relayed when needed. It does not address social media which by definition supports distribution of personal information. It focuses on personal information asked for during account creation. The model proposes a relay service to 1) hide the user’s device, 2) cache encrypted responses, 3) authorize vendors, 4) filter unwanted requests, and 5) provide features like anonymous email. Consumer information is relayed encrypted with the cloud necessarily holding the public key. This model assumes trust in the relay service. The proposed model does not stop vendors from caching users’ data without the user’s permission. The TrustCloud framework proposes layers of accountability: 1) systems Layer (i.e. OS, file system, and network), 2) data layer (i.e. provenance collection, consistency like backup and rollback, 3) workflow layer (i.e. audit trails, governance, patch management, and

75 accountability), and 4) policy and regulations layer [77]. They are using these concepts and framework as the basis for further research. “Tracking of Data Leaving the Cloud” describes a novel auditing methodology that tracks data leaving the cloud [112]. The data are archived into a self-executing container so that the container can both follow and protect the data from direct access. When a user tries to copy the file out of the cloud boundary, a ‘check-out’ of sorts is provided by a GUI. Along with the files, a viewer program is also archived along with the data. The viewer program acts as an interface for users to view, modify, or delete data. The second purpose of the viewer is to perform event logging on that data. This acts as a data tracking device. For the container to be effective, there must be an authentication server with credentials. A user must authenticate with it prior to being given access to the container. During authentication that container can log information about the user such as IP and host machine. When the user is finished, log info is sent to a collection server. The Stratus project is also associated with S2Logger [110] which we have reviewed in this section. The project website is stratus.org.nz. 4.2.5 VeriMetrix Framework, 2015

VeriMetrix propose procedures and a framework for data location metrics collection to ensure that European privacy statues and lawful locations are honored [68]. The focus is on automatically verifiable privacy metrics and determining them via a ‘top down’ or ‘bottom up’ approach. Top down starts from the requirement that drives the need for the metric but does not tell you how to obtain it. Indicators must be found to fulfill the requirements. Bottom up approach requires finding indicators first. For example, if the idea is to determine the location of the virtual machine, this might include identifying the data center (which requires cloud provider support). They suggest alternative methods such as determining virtual coordinates of surrounding network nodes, then measuring round trip time (RTT) to a known VM location. They propose a system architecture shown in Figure 4-14 [68]. A key part of the architecture is the use of reference or ‘measuring VM’ and a central collector. The

76 Figure 4-14. VeriMetrix architecture reference VMs have VeriMetrix agents that measure environmental parameters and send them to the central collector. The central processor collects the data, calculates the indicators, and provides the final metrics. One aim of the system is to provide evidence for privacy and location infringements. How to collect the evidence and the design of the agents is left as an open problem. They concur that the location of the measuring VMs (once widely publicized) could leave them vulnerable to attack. They suggest the use of transport layer security (TLS) to protect data during exchange. 4.2.6 Data Movement Systems Summary

In CloudFence, the test scenarios provided are for e-Commerce and a bookmark manager. While the results appeared to show the information was stored correctly, there were no discussions about how the user obtains audit data, the testing is inconsistent with the stated goals, and there were mathematical errors. The goal is to support benign providers and to enhance confidence by integrating it into a single cloud provider infrastructure. In this case, it is mainly geared towards external attacks and not malicious insiders. S2 Logger research provided a provenance file tracking example; the files must be identi- fied in advance. This is very appropriate for forensic activity, not necessarily detection. Further- more, this system is integrated into the cloud. As such, there might be many opportunities for

77 malicious insiders to interfere or manipulate the collection system and logs. The assumption is that once integrated, S2 Logger cannot be manipulated by malicious administrators. In the data location control model, the ECSM handles all accesses to data as well as data movement, thus it could become a bottleneck for simple access requests within the same cloud. In the model, a user’s data must already be tagged with a dataID. This is a challenging problem that we are also investigating. We are exploring the optional metadata fields within the OVF format [14] (a packaging standard for virtual machines) as a potential holding place for a dataID. In Verimetrix, the rationale for the decision to use reference virtual machines as collectors instead of dedicated physical machines is unclear. A novel concept of this research is a method to determine the location of virtual machines by recording round trip time to locations of known servers. Arcsight [13] and Splunk [15] are existing commercial systems that perform much of the same kind of infrastructure and network event monitoring they propose. Several of the projects found are position documents which reflects the youth of the research space. One of the inherent challenges with data tracking and data movement systems is getting close enough to the data to track it, yet not so close so that the system can be compromised by malicious insiders. 4.3 Chapter Summary

We have reviewed various trust and data movement systems. Our trust system review began with research that was spawned by the hardware-based TPM. The main shortcomings of existing research in the cloud trust space are: 1) the attestation server location, and 2) the use of trusted third parties. If the attestation server is located within the cloud, we assert this is a conflict-of-interest. Since the attestation server provides proof of action or security, we suggest the attestation entity should be outside the cloud. Frequently, TTPs are called in to perform the role of a broker. Who attests to the TTP? We feel the blockchain achieves the best of both worlds by having the policy storage accessible to the cloud, yet not so close that it is vulnerable to malicious insiders. In this dissertation the cloud is not the authoritative source

78 of the policies, the blockchain is. We concur with the need for more consumer choice and visibility into data location/migration. Most of the research we noted is designed with a single cloud in mind; very few systems have accounted for data moving across organizational boundaries. There is only one policy system that we know of [54] that adjudicates data across cloud boundaries, but the policies still do not focus on the consumer. Cloud trust protocol is also similar to our proposal in that it provides a cloud and consumer a shared portal; however our work is based on blockchain technology that hold consumer policies. Knowing when our data move and who has access to them foster cloud transparency and trust. Our blockchain-based cloud policies provide the consumer a choice to express their desires to the cloud that, to the best of our knowledge, presently do not exist.

79 CHAPTER 5 GOAL 1 - CLOUD TRUST MODEL AND VALIDATION Trust has been studied for decades in many works. We pay particular attention to Marsh in 1994 [82] and Castelfranchi in 2010 [41] for their formal models of general trust. We also look at the notion of probabilistic trust (Bayesian techniques discussed later) and consider whether these techniques are suitable, if at all, to describe how consumers trust the cloud. Lastly. we present our cloud trust model, hypotheses for claims that we have made, statistical power analysis, and finally feedback on our model from a survey. 5.1 Quantitative Trust Models

In [82], Marsh defined three types of trust: basic, general, and situational. Basic trust is the trust inherent to one person. It is a measure of how likely one is to trust anything or anyone and derived from past life experiences (i.e., some people are more trusting than others). General trust is the trust that one individual has in another without regard to a specific situation. Situational trust adds context or a specific task. This type of trust is the most applicable to the work we are doing. Given agents X and Y for a given task α Trust is represented as:

Tx (Y , α) = Trust that X has in Y for a given task or situation α, and T ∈ [−1, 1) (5–1) The work introduces other variables that are multiplied together to a final trust value. These additional variables represent other facets of the human condition that affect trust such as knowledge (e. g., how much X knows Y ), importance (e.g., how important is the task), and utility (e.g., how much usefulness). The interval [−1, 1) means −1 ≤ TX (Y , α) < 1.A value of 0 means X has no knowledge and is totally ambivalent. A value of -1 means complete distrust. The border conditions are interesting in that -1 (complete distrust) is possible, but 1 (complete trust) is not possible in this definition. Marsh also accounts for the case of potential skewing of the results as a result of multiplying two negative numbers. The value of 1 represents ‘blind’ or absolute trust.

80 Marsh’s work is one of the earliest attempts of a trust formalism. We found no focus on cloud trust within Marsh’s current research. Our attention was mainly focused on the formalism in chapter 4 and how consumers make decisions based on trust. In the 2010 publication “Trust Theory: A Socio-Cognitive and Computational Model” [41], Castelfranchi attempted to quantify belief-based aspects of trust and defined another quantitative model. In chapter 3, the analysis of trust is based on the assertion that degree of trust is a function of two types of belief: 1. The evaluated beliefs of the attributes (i.e., the attributes under evaluation) of the trustee upon which positive expectation is based, and

2. The subjective certainty (meta-belief) of the evaluated belief

Degree of trust of X in Y about activity τ is represented as DoTXY τ . The explanation of the resulting equation is lengthy. We summarize it here. Referencing belief item 1, the trustee Y is evaluated based on a certain task τ by several different reasons or sources (this is open to interpretation and flexible). Referencing belief item 2, each evaluation has its own subjective certainty applied. These certainties (in our view) act as weights, but they are intended to function as “how sure the source is”. These evaluations (abilities) are then paired with their certainties. Example values between 0 and 1 are shown in Figure 5-1 (note the commas in the figure are typos from the original material, they should be periods). The evaluation is on the horizontal axis and the associated meta-belief is on the vertical axis.

Figure 5-1. Abilities from different opinions versus beliefs strength [41]

81 Next, a “degree of credibility” variable is introduced and represented by DoCX . Degree of credibility uses a normalizing function to summarize of all the ability/belief pairs. The

normalizing function is represented as FX ,Y ,τ and produces a matrix from the values in Figure 5-1. In the matrix shown in Equation 5–2, each row represents an opinion from a source, or other reason. The first column is the quality value from that source and the second column are the normalized probabilities.   0.25 0.7     (5–2) 0.80 0.3 The authors state that F is a selective and normalizing function (i.e., selective in that not all pairs are chosen, and normalizing so that the sum of the probabilities in the right column must equal 1). The matrix in Equation 5–2 can be illustrated in a graph. Figure 5-2 shows the selective nature of the values in the matrix (e.g., .2 and .4 are collapsed into .25, but not explained in the material).

Figure 5-2. Probabilities for two different abilities of Y [41] The final degree of trust is defined in the following equation that takes into account the degree of credibility from multiple dimensions. The variables are defined in Table 5-1. We noted inconsistencies in this published material. The variables τ and α both appear to represent the task and are used in the same equation that leads to confusion. In addition, the variable p appears to represent a goal, but it has no apparent contribution.

DoTXY τ =CoppDoCX [OppY (α, p)]

×CabDoCX [AbilityY (α)] (5–3)

×Cwill DoCX [WillDoY (α, p)]

82 Table 5-1. Variables in Castelfranchi degree of trust Variable Meaning DoCX [SomeQualityY ] ∈ (0, 1) Degree of credibility for X about a specific quality of Y DoCX [OppY (α, p)] ∈ (0, 1) Degree of credibility of X’s beliefs about Y’s opportunity to perform a task α to realize some goal p DoCX [AbilityY (α)] ∈ (0, 1) Degree of credibility of X’s beliefs about Y’s ability or competence to perform a task α DoCX [WillDoY (α, p)] ∈ (0, 1) Degree of credibility of X’s beliefs about Y’s actual perfor- mance Copp, Cab, Cwill Undefined constant values representing intangible phenom- ena that cannot be easily captured

In the final analysis, the DoT will more heavily weight the values that are more probable. The normalizing function F is essentially boiled down to a single number that takes into ac- count all the sources, values, and strengths. We find the idea of meta-beliefs to be interesting, but also leading down a slippery slope of endless abstractions (e.g. do we need to consider belief in your beliefs?). The constants (Copp, Cap, Cwill ) that are included by the author take into account intangible phenomenon or interference among external influences. 5.2 Probabilistic Models

Probabilistic models are also a popular approach to modeling trust. These models are built to assist in the prediction of a principal’s future behavior. The hallmark of Bayesian techniques, named in honor of Rev. Thomas Bayes, is to predict a variable of interest based on the causal effects of other events. The foundation of the approach is captured in the phrase: “updating probabilities in the light of new evidence” [90]. Given two disjoint events A and A with P(A) + P(A) = 1, then for any new event B, Bayes’ rule says the updated probability can be defined as [92]: P(B|A) ∗ P(A) P(A|B) = (5–4) P(A) ∗ P(B|A) + P(A) ∗ P(B|A) In probabilistic approaches, the trust of A given B becomes a probability distribution (which is estimated by A) over the range of interaction outcomes with B. Estimating these is based on the history of interactions with B. The beta model is a probability distribution function often used for reputation systems and uses Bayesian techniques applied to the history of interactions to predict future outcomes [40, 70, 113]. Briefly, the beta model is based on

83 the beta distribution. There are several distributions in probability theory: Gaussian (i.e. the normal distribution for bell shaped curves), Poisson, binomial, and more. The beta distribution assumes a binary event with a uniform distribution. It is characterized by two variables α and β which represent two complimentary events or ratings. However, the beta model has been criticized for assuming a fixed behavior for each individual which is not very realistic. For example, what if a principal were to change its behavior over time? [52]. Researchers have turned to Hidden Markov models for modeling the dynamic behavior of principals. A Hidden Markov Model (HMM) is a specific form of Bayesian network in which the state at some time encapsulates all we need to know about history of a process to predict the future. Hidden Markov Models treat processes themselves as hidden but the results as visible. The HMM assumes that the process generating the observation is hidden from the user. It also assumes that the state of the hidden process satisfies the Markov Property: “given the value of St−1, where S is a state and t is a specific time, that the current state St is independent of all states prior to t − 1. In other words, the state at some time encapsulates all we need to know about this history of the process in order to predict the future” [59]. There are only a few works we found in research that use the HMM approach [52, 106, 107]. In 2014, Thirunarayan reviewed the most common trust management models including recommender systems and trust propagation frameworks [113] . In trust propagation frame- works, a trust chain is formed from recommender to recommender. These networks are most commonly modeled as directed acyclic graphs (DAGs) having been extracted from cyclic trust networks. In this case, each node is a recommender and each directed link is assigned a weight, which is the trust value. The trust values are accumulated probabilistically from node to node and provide a final recommended trust value. Several works in the literature have examined Bayesian techniques for trust modeling [83, 86, 102, 117, 118]. According to [83], Bayesian theory is intuitive because it is a sequential process; as new evidence comes to light, the evidence is processed and added to the global

84 evidence. It allows the trustor to update her information dynamically, which happens in real life. In summary, probabilistic models are very useful for parts of our cloud trust model. However propagating trust (as many DAG models do) are not very realistic. If it were possible to interview the entity directly there would be no “black box” and it would become a direct feedback situation. Therefore, we assert that models with trust chains that include estimated probabilities have room for improvement. With regards to the trust model of [41], details on the normalizing function are left out. It is clear, and we concur, that all values and judgments of trust can be defined on a range of 0 to 1. We disagree on the meta-belief approach and favor instead the idea of weighting. It very well might achieve the same effect. We therefore endeavor to approach the subject with a mathematical, yet more realistic and simpler approach to trust and the trust decision process. All these techniques approach trust estimation from a probabilistic high level. Our model views how an individual estimates trust. Our approach is to specify decision making through categories of sources from which we take recommendations. People tend to seek recommendations from sources in a specific order, which we validate later in this dissertation via a trust survey. We have ordered these sources as degrees of recommendation. The trust value is determined after the recommendation process is complete. 5.3 A Trust Model for the Cloud

Having reviewed previous trust models that formalized trust, we determined there was a lack of models that specialized in cloud computing. The models that do exist for calculating trust have very little validation. How accurate are existing trust models? We model consumer trust in the context of cloud computing. The value of our data is a key indicator of whether we place it in the cloud and if so, which cloud. We assert that the value we place on data plays a critical role in how we trust. We also add expectations, data value, and a decision framework to our model. We have only seen expectation in the works of Castelfranchi [41].

85 We propose a survey to validate our model. Few cloud trust models appear to be validated against real world data, and none, that we found, accomplish an a-priori statistical power analysis. We accomplish an a-priori power analysis to our survey to help us evaluate our treatment of the factors influencing consumer cloud trust. Our trust model for the cloud includes what we call five degrees of recommendation, a cloud trust cycle, and lastly accumulated trust based on past experiences. 5.3.1 Five Degrees of Recommendation

We delve into the decision making process by exploring the attributes of our sources and put these sources into groupings which we call degrees. We show how we integrate these degrees when deciding to trust the cloud. We claim that it is more realistic than much of the existing research and we validate this model via survey. We assert that in practice there are groups (i.e., categories or ‘degrees’) and these degrees are prioritized in a typical order for most people from which consumers seek and take recommendations. Figure 5-3 is our model of how consumers seek recommendations. These recommendation degrees are in the order people typically use them - based primarily on familiarity. If the recommendation is in the first degree, the decision to trust is based on first person experience. If we have direct experience with say, a product, an airline, or a cloud, we typically use our own experiences above all else. We then take recommendations from family and relatives followed by friends and colleagues. Following this we look toward expert organizations and groups of people. The larger the size of the group ratings that agree about a product or service, the stronger the recommendation and lesser chance of collusion. We call this last source the fifth degree. A special case of the fifth degree is a group of one, such as an unfamiliar third party or stranger. Under some circumstances we are willing to trust anyone if the potential benefit outweighs the risk. For example, auto accident survivors on a remote road who cannot get a response from a professional responder would accept help from anyone because the alternative is loss of life.

• 1st degree: direct experience (i.e., personal experience).

86 Figure 5-3. Taking recommendations

• 2nd degree: family and relatives • 3rd degree: friends/colleagues/acquaintances • 4th degree: un-biased or expert organizations (e.g., Cloud Security Alliance - Cloud, Consumer Reports - Retail Products, US News - College Ratings) • 5th degree: groups of unfamiliar people (e.g., ratings from Amazon, Tripadvisor) These degrees do not work in isolation, a consumer can get recommendations from all degrees. They often work together for the person seeking the recommendation. For all degrees but the first degree, the strength of the recommendation is examined using the source’s attributes. These recommendation attributes are contained in the variable ρ “rho”. ρ = F ((σ ∗ NUMSOURCES), (ϵ ∗ EXPERTISE), (η ∗ FAMILIARITY )) ∈ [0.0..1.0] (5–5) Where σ, ϵ, η are weights assigned by the trusting individual and they must satisfy the following equation: σ + ϵ + η = 1 (5–6) Table 5-2 represent the possible values of each attribute. For cloud trust, the recommen- dation ρ leads to a decision to trust and feeds into our cloud trust model. These categories can be applied to all degrees (except direct experience). The expertise is most relevant for organizations. The number of sources is most relevant for group ratings. Finally, the familiarity is most relevant for friends or family members. However, each category can be applied to any source.

87 Table 5-2. ρ Attributes # Sources Value Perceived expertise Value Familiarity Value 1 0 No expertise 0 No familiarity 0 2 .. 10 . Low . Low . 11 .. 100 . Medium . Medium . 101 .. 1000 . High . High . > 10000 ≤ 1 Very high ≤ 1 Very high ≤ 1

Table 5-3. Definitions for cloud trust functions Meaning Representation recommendation value [0.0..1.0] ρ trust threshold required for the value of our data θ evidence: attestations, security plans, external sources EV consumers’ expectations, expressed policies EXPT consumers’ experience good or bad EXPR ∈ {γ, β} trust level of x τx evidence leading to loss of trust in a cloud (beta for bad) β evidence to increase trust in a cloud (gamma for good) γ

5.3.2 Cloud Spiral of Trust

Table 5-3 shows the variables used in our model. All trust is cyclical; we build on existing trust through our own experience and through recommendations from others. Each loop per level in Figure 5-4 represents a higher level trust (i.e., data value) that consumers are willing to entrust to the cloud. As you move to higher spirals of trust, the consumer is willing to entrust data of higher value to a third party. We model the trust value similarly with minor

Figure 5-4. Cloud spiral of trust

88 differences as applied to the cloud. Trust of τ of x in y based on an action α at time t and experiences EXPR, is a function:

τx (Y , α, t) = F (τx (Y , α, t − 1, EXPR(t))) (5–7) Our decision to trust in the cloud is based on a threshold level θ. The trust threshold must exceed θ for consumers to entrust their data. Each cycle feeds our trust and leads to a trust decision which involves a threshold value θ for our data value, our expectations EXPT , and our experience EXPR at a certain time t (which can be good γ or bad β).

τx (Y , α, t) ≥ θ(VD , EXPT , EXPR(t)) (5–8) θ represents an important threshold decision point based on the value a consumer places in her data. It is different per person and represents a boundary for each data set. We assume the consumer has the competence to judge its sources of recommendation. Once a consumer has elected to use a cloud provider based on recommendations ρ and our expectations EXPT , they begin to have direct experience with the provider and the cycle of experience, expectations, and evidence (i.e., attestations) begins. A low datum value will bring the consumer’s trust threshold θ down because there is less personal risk. For example, using Instagram, Twitter, or Facebook, we might not mind if our data are available to the world. We target those cases, both business and private, where we value the confidentiality and integrity of our data. In addition, while the consumer is evaluating trust, others are also evaluating their own evidence (which can become part of our evidence, sometimes via out-of-band channels) as shown by “reports” in Figure 5-5. A consumer’s ongoing trust in the cloud τi+1 is a

Figure 5-5. Trust cycle

function h of the previous trust value τi and the last decision δi (which resulted in evidence

89 of trustworthiness good γ or bad β) and our expectations at this current level of trust via our policies:

τi+1(Y , α) = h(τi (Y , α), EXPR) (5–9) At this point our model rests upon the cyclical nature of trust accumulation in the cloud. We believe the function h can best be modeled as a previously discussed Bayesian probability function and leave exploration of h for future work. 5.4 Trust Model Effectiveness and Validation

How can a trust model be evaluated for its effectiveness? Consumer’s expectations play a significant role in framing their experiences. These expectations can be expressed via policies that are consumer-oriented and stored in a high integrity fashion. Existing research has performed cloud trust and usage surveys (i.e. questionnaires). Why propose another questionnaire on the cloud? Many of the existing models are not practical in their approach to recommended trust and few are validated [72]. Furthermore, many of the questions we asked have never been asked to our knowledge. Our research focuses on consumer data in the cloud and asks these types of questions: where are our data? do consumers want to grant blanket access to our data when it is uploaded to the cloud? do we trust one cloud over another? We reviewed academic literature on trust and cloud usage. We were searching for existing surveys that performed statistical power analysis prior to sending out their questionnaires. Our overall approach was to look for existing trust surveys and determine their effect variable, sample size, and power. Furthermore, we developed and administered a questionnaire to validate our model using university processes. As future research, we will refine our trust model based on the results. 5.4.1 Industry Surveys and Academic Survey Research

There several annual industry sponsored surveys that target enterprise cloud usage (e.g., Forbes, Gartner, Rightscale). In 2018, Rightscale targeted primarily technical professionals at

90 the enterprise and small business level and found that most at 97% use the cloud.1 Also in 2018, Forbes reported that 80% of enterprises are both running apps on or experimenting with Amazon Web Services (AWS).2 In the academic community, [30] accomplished a post-hoc power analysis. The research asserted that there is an inconsistency between the claims of industry reports on cloud usage versus those from academia. Questions from previous surveys were used. It is not clear that the chosen effect variable or size was consistent with the research goals. The chosen effect variable (Cohen’s f 2) is consistent with the multiple regression testing, however, the value assigned seemed arbitrary. The power analysis boasted power of significantly greater than 95%. (Note: Jacob Cohen’s 1988 textbook on Power Analysis has provided a basis for much research in power analysis and effect size [43]. In [38], a post-hoc power analysis was used to explain consumer archiving patterns in the cloud. The research goal was to prove that trust was a factor that helps uncertainty and reduces the perception of risk. They also sought to understand the effects of trust and risk in the context of cloud archiving and important drivers of trust as feedback for the cloud to improve. They had several hypotheses that addressed different aspects of trust, risk, and perception in the context of archiving data in the cloud. The effect variable and value they used to calculate statistical power was again f 2 = .15 for ‘medium’ effect based on Cohen’s definition for multiple regression and correlation analysis. They reported a study power of .99, given a sample size of 229. Research in [65] did not conduct a power analysis. The author wanted to answer the following questions: Is there a lack of consumer trust with cloud service providers? Is there a way for cloud providers to earn trust? Will it be profitable for cloud service providers to work towards gaining consumer trust? Their survey question types consisted of T/F questions,

1 https://www.rightscale.com/blog/cloud-industry-insights/cloud-computing-trends-2018- state-cloud-survey#96-percent 2 https://www.forbes.com/sites/louiscolumbus/2018/09/23/roundup-of-cloud-computing- forecasts-and-market-estimates-2018/#7073eb37507b

91 Likert scale (e.g., strongly agree, neutral, disagree), and some open questions. They had 236 responses. They concluded that consumers trust cloud computing only if the risk is low and there is convenience [65]. Hsu conducted a post-hoc power analysis for a telephone survey [66]. The goal was to examine cloud adoption in Taiwan from an industry perspective and test their research model. Their survey targeted top Taiwan companies; they had approximately 200 responses. Their results appeared to show that cloud adoption was still in its initial stage at the time of the survey. They accomplished a post-hoc power analysis with a reported statistical power of .999. However they did not use null hypotheses, the effect variables were not consistent with the authors’ definitions, and the definition of power analysis was inconsistent with the references. The surveys [30, 38] both used an effect f 2= .15 which is appropriate for effect size in multiple regression and correlation models. Our goal is not to determine correlation. We are making an assertion about a portion of the population through hypothesis testing. 5.4.2 Power Analysis for Surveys

Power analyses are normally run before a study is conducted; this is called an a-priori analysis [51]. Power analysis is not an exact science but should be considered before any study is conducted. A power analysis allows you to forecast if your study will avoid type I and II errors (discussed later) to the maximum amount possible. Statistical power is the probability of rejecting the null hypothesis (conventionally denoted Ho ). The null hypothesis is the population characteristic initially assumed to be true and the one to be challenged. The alternative hypotheses (Ha or H1,H2,etc ) are the competing claims [91]. These are the steps involved in conducting a power analysis [67]: 1. Select the type of power analysis desired: a-priori (i.e., prospective, in advance), or post-hoc (after data collection) 2. Select the expected study design that reflects your hypotheses of interest (e.g. t- test,ANOVA, etc.) 3. Select a power analysis software tool that supports your design, if applicable.

92 4. Provide 3 of the 4 parameters for calculation (solve for the 4th). The four parameters are: 1) sample size N, 2) effect size (e.g., small, medium, etc.), 3) statistical power (the probability of rejecting the null hypothesis, (1 - β), where β is type II error), and 4) α which is type I error rate. These terms will be further explained. In addition, the effect size and variables that represent them vary widely depending on the type of test performed (e.g., Cohen’s f 2, Glass’ , Hedges’ g are only a few) 5. Solve for the remaining parameter, usually sample size N [51]. We are solving to determine the required sample size N to give us enough statistical power. The probability of type I error is denoted by α. A type I error is the probability of incorrectly rejecting a true statistical null hypothesis. Power is equal to 1 − β where β is the probability of type II error. A type II error is the probability of incorrect acceptance of H0. The goal is to ensure that the survey or questionnaire tests what it is supposed to test, has a high chance of providing significant results, and wastes as little time and effort as possible. Cohen’s power recommendation at .80 along with Fisher’s α criterion at .05 are together known as the ‘five-eighty’ convention and are generally accepted values to provide a good balance between α and β risk [18, 20, 51]. 5.4.2.1 Effect size

The a-priori power analysis depends primarily on effect size. The expected effect size needs to be gleaned from the hypothesis, similar tests, and related literature in the field being studied. The effect size can also vary significantly based on the statistical test performed. The effect is the size of change in the parameter of interest. The effect size represents the bulk of effort in an a-priori analysis. In statistics, there are dozens of effect variables depending on which statistical test is run. We are hypothesizing about the proportion of the population that uses and trusts the cloud and other cloud trust and decision related measures to validate our trust model. The hypotheses drive the types of tests and the effect size drives the power analysis. Since we are hypothesizing about a population proportion, two kinds of tests might apply: 1) the test that a

93 proportion is .50, and 2) the difference between proportions [43]. The effect index g is used to check for majority support, while the effect index h is used to check for an arbitrary proportion. We ultimately decided to use the index g; many a-priori power analyses are somewhat arbitrary, but are helpful to align the researcher with the proper goals. We explain these two effect variables next. 5.4.2.2 Test that a proportion is .50 effect index g

The test for a proportion is .50 assumes that the nominal proportion that exists in the population is a 50/50 split. These hypotheses are typically related to public opinions and the null hypothesis could take the form of a fraction of the population with certain beliefs or a

defined characteristic which is 1/2 (i.e., H0: P = .50, where P is the sample proportion). For example, a majority support for a political issue [43]. Cohen defines this effect index using the variable g. The g index relates to a departure from P = .50 in either direction. This can be represented by a positive departure, negative departure, or just the absolute value of the departure (always positive). Unlike other effect indices, g is expressed simply as the departure from .50 and therefore is more clear to a behavioral scientist. For example, a small effect is g = .05. At this effect definition, a division in the population would be a 55/45 split; this is considered a small departure from the null (50/50). But a small departure in one instance might be considered large in real life. So context has to be considered (e.g., a 55/45 split is considered a landslide in political elections). 5.4.2.3 Difference between proportion effect index h

The test for proportion is a more general case of testing that a proportion is .50. In this test, any proportion can be used depending on what is being hypothesized. One proportion is the hypothesized (or alternate) proportion P1, the other proportion, P0, represents the null hypothesis (that which is currently believed to be true). The effect index for this test h is calculated:

h2 = ϕ1 − ϕ2 (5–10)

94 In this case, the effect h is not just the difference between the two proportions, it is the difference in the arcsine root, or angular transformation here where [43]: √ ϕi = 2arcsine( Pi ) (5–11) 5.4.2.4 Our statistical power

Since ours is an opinion poll, we will accomplish test for a proportion different than .50. For example, we believe a portion of the population still does not trust the cloud. For effect variable g:

• Given Power at 0.80, α = .05, g = .25, Cohen’s table 5.4.1 for test of 50% proportion suggests a sample size of 30 is required [43]. We desired to gather a sample size of at least 150 based on the other surveys on trust. Many of our test questions have never before been asked in a survey to our knowledge. 5.5 Survey Mechanics and Distribution

Our research focuses on consumer data in the cloud and asks these questions:

• Where is our data?

• Do we care?

• What organizations have access to our data?

• Do consumers want to grant blanket access to our data when it is uploaded to the cloud?

• Do we trust one cloud over another? We have devised a consumer focused cloud policy system based on blockchain technology to fulfill this need; contacting all clouds is not feasible. Ours is the first system of its kind to fill this need to the best of our knowledge. There are existing models (many that overlap) that attempt to quantify cloud trust, but many are not practical in their approach to recommended trust and few are validated against real world attitudes. Our ideal target was a cross section of the general population. We did not want specific IT groups or technology managers due to the potential to skew the numbers. Industry surveys mostly sought tech professionals, management, and CEOs. Many of these surveys were extremely positive on both cloud usage and trust leading to bullish results.

95 5.5.1 UF Institutional Review Board and Distribution Information

Our anonymous survey was approved by the UF IRB under IRB201802931 Feb 11, 2019. Since UF policy forbids mass email, we sought out listservs and department mailing lists. ListServs can be public or private, but usually have a moderator controlling distribution. While we targeted a much wider audience, we were required to seek advance permission and received it from the following departments, groups, or colleges:

• Florida Institute for Cyber Security (FICS) Research Coordinator, Lesly Galiana

• UF College of Engineering Undergraduates via newsletter. POC Dr Curtis Taylor

• Warrington Business School via Brian Roberts. Specific mailing lists used were not provided. We sent the following email to our POCs. The survey was active for three weeks.

Dear UF students and faculty, My name is Stephen Kirkman, I am a member of the Computer and Information Science and Engineering department at the University of Florida. My advisor, Dr Richard Newman, and I are conducting a brief (10-15 minutes) survey. The survey supports our research and examines attitudes about cloud computing usage and trust. This study is approved by the University of Florida Internal Review Board (IRB protocol IRB201802931). We used selected publicly listed UF ListServs and mailing lists for distribution. Both your decision to participate and your responses to the survey are completely voluntary and anonymous. The purpose of this survey is a study on the extent to which consumers trust and use the cloud regarding preferences of data location, sensitivity, decision making, and privacy. If you have questions about this study, please reply to the emails below and we will respond as quickly as we can. If you are willing to participate in the survey, please click the link below. The link will be active for 2 weeks. In the event you receive more than one copy of this invitation, please take the survey only once. The link is listed below. https://ufl.qualtrics.com/jfe/form/SV 0izYUNZwR1GrHWB Security: Because the online host (Qualtrics) uses several layers of encryption, it is highly unlikely that a security breach of the online data will result in any adverse consequences for you. All study data will be collected through Qualtrics, a secure site with SAS 70 certification for rigorous privacy standards. Any information that you provide through this program will be encrypted for security purposes using Secure Socket Layers (SSL). Privacy: No personally identifiable data are collected or stored by this survey. To protect your privacy, Qualtrics masks all IP addresses and your IP address

96 is unavailable to and unidentifiable by the study investigators. Only the study investigators will have access to the data on Qualtrics. Whom to contact if I have questions about the study: Stephen Kirkman, PhD Candidate kirkman@ufl.edu, and Dr Richard Newman nemo@ufl.edu , Box 116120, Computer and Information Science and Engineering, University of Florida. If you have questions about your rights as a research participant in the study, contact UFIRB office, Box 112250, University of Florida, Gainesville, FL 32611-2250 By checking to box below, you acknowledge that you have read the information and agree to participate in this study. Sincerely, Richard Newman, PhD Stephen S. Kirkman, PhD Candidate 5.5.2 Survey Respondents and Demographics

Raw survey data are in Appendix B. We were hoping to get 200 to 300 respondents based on the approximate size of the departments and the number of undergraduates in the UF College of Engineering. Since the survey distribution and the responders were anonymous, we can only estimate our response rate. We received 35 responses which is an estimated 2% response rate. The age range was overwhelmingly in favor of 18-34, which is expected. The self-reported technical competence was also significant.

Table 5-4. Q19. Your age range (no subject under 18 should complete this 18-24 survey, thank you)

Age range Average % 54% 18-24 54.29% 25-34 40.00% 6% 45-54 35-44 0.0% 40% 45-54 5.71% 55-64 0.0% 65-74 0.0% 25-34 75-84 0.0% 85 or older 0.0%

97 Somewhat agree Strongly agree 17% 51%

6% Strongly disagree 11% 14% Somewhat disagree

Neither agree nor disagree Figure 5-6. Q1. I trust the cloud Table 5-5. Q20. Technical competence Pro Savvy Technical competence Average percentage 20% 49% Pro 20.00% 3% Newbie Savvy 48.57% Average 28.57% 29% Newbie 2.86% Not 0.0% Average

5.5.3 Selected Result Charts

This section presents selected charts and discussion. Percentages from the data were rounded to the nearest full percentage for pie chart readability. In some cases, this resulted in totals not equal to 100% and caused an empty slice. This was done to make the pie chart more readable and to avoid misrepresenting the data. Raw data are in Appendix B. Our first question in Figure 5-6 was mainly informational to get the attitudes of the target population with regard to trusting the cloud. Results showed that over two-thirds (51 + 17 = 68) of those surveyed do trust the cloud. We were very interested in the top cloud concerns. We used our own knowledge to flesh out possible concerns, but were mainly interested in which concerns were most important to people. Privacy and confidentiality were clear winners. There was variation among the remaining options. We had believed that location of users’ data might come in a clear second place, but it did not and this will guide

98 1

2

3

4

5

Ranking (1 most important) 5 10 15 20 25 30 # Votes Location Privacy Availability Third-Party Ease-of-Use

Figure 5-7. Q3. Rank these cloud concerns (1 being most important) Strongly agree

74%

6% Strongly disagree 6% 14% Somewhat disagree

Somewhat agree Figure 5-8. Q8. I care if my cloud data are accessed by a third-party without my knowledge future work. Figure 5-7 shows cloud concerns ranking preferences. One represents highest concern and five represents the least concern to consumers. Each horizontal bar in Figure 5-7 shows the number of votes each cloud concern received. In other words, not everyone ranked the cloud concerns the same. However, the overwhelming majority believed privacy was a primary concern. We also wanted to know how much people cared about third parties accessing their data without the consumer’s knowledge. We were not very surprised that this kind of access was a concern to the overwhelming 88% majority of those surveyed shown in Figure 5-8. One of the major themes of our research was data location in the cloud. The next question was designed to determine the level of detail consumers want. Figure 5-9 shows that even though

99 Out of country I do not care

18% 44%

9% Out of a server in a data center

9% 18% 3% Out of state Out of a data center Out of city Figure 5-10. Q12. I care if my cloud data is moved without my knowledge data location was not a top concern for cloud users, over 80% of those surveyed want to know where their data are and 25% want to know at which data center their data resides.

In my country

I do not care 30% 18%

12% 9% In my state On a specific server in a data center 6% 24% In my city At a specific data center Figure 5-9. Q11. At what detail would you like to know where your data resides One of the linchpins to our research is controlling our data in the cloud via cloud policies. We want the cloud to seek more fine grained permission to access our data. Figure 5-10 shows that a majority of over 80% of those surveyed do care if their data are moved without their knowledge. Another branch of this research is how consumers decide to trust the cloud. We assert there is a specific order of recommendations people take before choosing a cloud. Figure 5-11 shows a clear order of recommendation for the top two. While these results are not confirmed by our hypothesis, this forms the basis for further research. Many trust models talk about recommendation chains. We believe that recommendations cannot be accumulated in ways that existing trust models assert. Consumers typically do not seek recommendations from friends of friends without knowing them directly. Figure 5-12

100 1

2

3

4

5

Ranking (1 most important) 5 10 15 20 25 30 35 # Votes Personal Exper Unbiased Org Family Group Rating Friend

Figure 5-11. Q13. Rank these potential sources of knowledge (for a recommendation) in the order that you would use it A friend

69%

6% A friend’s friend’s friend

26%

A friend’s friend Figure 5-12. Q14. When choosing a cloud, I would use a recommendation from shows that the overwhelming majority of those surveyed do not take recommendations from someone they do not know. We are building a cloud policy decentralized application that would require small fees to use. We wanted to determine if consumers might pay for this kind of service. Figure 5-13 shows us that there is definitely some interest in paying for policies if all clouds followed it. 5.5.4 Survey Summary

From the survey, we could generally conclude that there is broad trust in the cloud at some level, however there is differentiated trust in clouds based on data value. Location of a consumer’s data is important, but not as important as privacy and availability.

101 25 cents nothing 3% 31%

1 dollar 34%

31%

greater than 1 dollar

Figure 5-13. Q17. I would pay as a one-time charge to use a personalized cloud policy, if all clouds followed it Knowledge and control over data movement important to vast majority of those surveyed. To the extent of taking recommendations, familiarity with recommender is not most significant factor when deciding which cloud to trust; perceived expertise may play a larger role this needs more study! A larger survey is needed to obtain the desired level of significance. 5.6 Hypotheses for Cloud Trust Model

We proposed to target specific areas of cloud consumer trust consistent with our proposed model [72]. Our goal was to validate elements we identified as key to determining consumers’ actions with regards to using the cloud. We are especially concerned about how consumers decide to trust, third-party visibility, and data location. Most of the hypotheses are estimates gained through the existing research. We have not found any specific evidence in our liter- ature review that leads us to believe that there is solid evidence for or against any of these hypotheses. 5.6.1 Hypothesis Test Plan

The basic idea of hypothesis testing is to “reject the null hypothesis if the observed sample is very unlikely to have occurred when the null hypothesis is true” [92]. There is no right or wrong answer; the results help to gain further insight on the problem. The general approach for setting up a test is to: 1) determine competing hypotheses, 2) sample the population, 3) compute the summary statistics (use statistical analysis to determine if the sample data supports the null or alternative hypothesis), 4) analyze, and 5) report.

102 Test plan: 1. State the null and alternate hypothesis (or hypotheses), which are mutually exclusive (i.e., if one is true, the other must be false). The null hypothesis is the currently accepted belief based on popular wisdom, or literature review. In many of our cases, we found no equivalent statistics for our questions. We took educated guesses regarding the null. The alternate hypothesis is that which the researcher attempts to prove. The research hypothesis depends on the parameter of interest and whether it is higher or lower than the null hypothesis and by how much. This decision leads to upper-tailed (greater than a hypothesized value), lower-tailed (less than hypothesized value), or both tails (not equal to hypothesized value) of the normal distribution. 2. Choose a significance level α = 0.05. As mentioned earlier, .05 is a standard size for α. 3. Collect survey data. 4. Choose the test statistic. Small samples generally use the t-test. We will use is the one-sample z-test (for large samples n > 30) to determine whether the hypothesized population proportion differs significantly from what current surveys suggest. 5. Analyze and perform computations: (a) Obtain standard deviation σ from survey tool, if applicable (b) Compute test statistic. Calculate the test statistic, noting that it can be negative; where P is proportion in sample from the current tests and p is the proportion in the null hypothesis. (z-test used when n>30, t-test used when n<30) z = (p − P)/σ (5–12) (c) Compute P-value: The P-value is the probability of observing a sample statistic more extreme than the ones observed in the data - assuming that the null hypothe- sis is true; use the normal distribution and calculate P-value using standard normal tables.

6. Compare P-value to α using the rule: If P <= α, then reject H0. If P > α, then fail to

reject H0.

103 5.6.2 Hypothesis 1

• π = the true proportion of cloud consumers that are interested in cloud third party associations. • Null: π < .25. We estimate less than 25% of cloud consumers are interested in cloud third party associations • Alternate: π ≥ .25. We believe 25% or more of cloud consumers are interested in cloud third party associations Table 5-6. Q8. I care if my cloud data are accessed by a third-party without my knowledge Opinion Average % Strongly agree 74.29% Somewhat agree 14.29% Neither agree nor Disagree 0.0% Somewhat disagree 5.71% Strongly disagree 5.71% 1. Pre-conditions to use z-score. The first one is not satisfied, however these the rules are conservative rules of thumb, we continue to use the normal distribution. To assume a normal distribution, the following equations must be satisfied: nπ ≥ 10 and n(1 − π) ≥ 10 where π is the proportion in the null hypothesis and n is the number of samples. Using the value 10 is a conservative rule of thumb, not a hard requirement. According to [91], if π = .5, the z-test can be used for n down to 20. Likewise for a π closer to 0 or 1, n should be much larger; at least 200.

n(π) ≥ 10; 35(.25) ≥ 10; 8.75  10

n(1 − π) ≥ 10; 35(.75) ≥ 10; 26.25 ≥ 10 (5–13) 2. Obtain standard deviation of σ of sampling distribution. √ √ √ σ = (P ∗ (1 − P)/n = .24 ∗ (1 − .24)/35 = .0052 = .072 (5–14) 3. Compute Test Statistic:

p = 89% surveyed number of people who use trust chains

P = 24% research uses trust chains frequently to represent trust

z = (p − P)/σ = (.89 − .24)/.072 = .65/.072 = 9.03 (5–15) 4. Compute P-value: We have a right tailed test since we are saying greater than 25%. P(z > 9.03) = .9999 based off standard normal table (5–16)

104 5. Compare P to α using rule:

If P ≤ α, rejectH0.If P > α, fail to rejectH0

Since P = 1 > .05(α), we cannot reject the null hypothesis. (5–17) The observed sample certainly supports the hypothesis, but the test statistic z along with the P value showed us that it is not enough to reject the null hypothesis. This is most likely due to either chance variation or a small sample size. 5.6.3 Hypothesis 2

• π = the true proportion of cloud consumers that use referred trust chains. • Null: More than 50% of people use referred trust chains based on several algorithms for trust in literature we surveyed • Alternate: We believe less than 50% use referred trust in actuality extending beyond the person they know

Table 5-7. Q14. This question concerns referred trust. When choosing a cloud, I would use a recommendation from Recommend Source Average % A friend 68.57% A friend’s friend (here and below, you do not know the 25.71% individual) A friend’s friend’s friend 5.71% A friend’s friend’s friend’s friend 0.0% A friend’s friends’ friend’s friend’s friend 0.0% 1. Pre-conditions to use z-score. To assume a normal distribution, the following equations must be satisfied: nπ ≥ 10 and n(1 − π) ≥ 10 where π is the proportion in the null hypothesis and n is the number of samples. Using the value 10 is a conservative rule of thumb, not a hard requirement. According to [91], if π = .5, the z-test can be used for n down to 20. Likewise for a π closer to 0 or 1, n should be much larger; at least 200.

n(π) ≥ 10; 35(.5) ≥ 10; 17.5 ≥ 10

n(1 − π) ≥ 10; 35(.5) ≥ 10; 17.5 ≥ 10 (5–18) 2. Obtain standard deviation of σ of sampling distribution. √ √ √ σ = (P ∗ (1 − P)/n = .50 ∗ (1 − .50)/35 = .0072 = .084 (5–19)

105 3. Compute Test Statistic:

p = 31% surveyed number of people who use trust chains

P = 50% research uses trust chains frequently to represent trust

z = (p − P)/σ = (.31 − .50)/.084 = −.19/.084 = −2.26 (5–20) 4. Compute P-value: We have a right tailed test since we are saying greater than 25%. P(z < −2.26) = .012 based off standard normal table (5–21) 5. Compare P to α using rule:

IfP ≤ α, rejectH0.IfP > α, fail to rejectH0

Since P = .012 < .05(α), we reject the null hypothesis. (5–22) The observed sample certainly supports the hypothesis, test statistic z along with the P value showed us that our survey data is enough to reject the null hypothesis. We believe that referred trust is a poor way to calculate trust. We believe this is in part due to the not knowing the decision making process of each entity. 5.6.4 Hypothesis 3

• π = the true proportion of cloud consumers that have different trust thresholds. • Null: Based on common cloud surveys (e.g., Gartner), many people and companies are putting more and more data in the cloud. Our null hypothesis therefore is less than 25% of cloud consumers have different trust thresholds based on value of their data. This is an estimate since there is little specific discussion in research on use of the cloud depending on the sensitivity of your data. • Alternate: We believe 25% or more of consumers have different trust thresholds based on their individual experiences and value of data being placed into the cloud

Table 5-8. Q15. I trust more sensitive data to some clouds, but not others Opinion Average % Strongly agree 20.00% Somewhat agree 40.00% Neither agree nor Disagree 22.86% Somewhat disagree 11.43% Strongly disagree 5.71% 1. Pre-conditions to use z-score. The first one is not satisfied, however these the rules are conservative rules of thumb, we continue to use the normal distribution. To assume a

106 normal distribution, the following equations must be satisfied: nπ ≥ 10 and n(1 − π) ≥ 10 where π is the proportion in the null hypothesis and n is the number of samples. Using the value 10 is a conservative rule of thumb, not a hard requirement. According to [91], if π = .5, the z-test can be used for n down to 20. Likewise for a π closer to 0 or 1, n should be much larger; at least 200.

n(π) ≥ 10; 35(.25) ≥ 10; 8.75  10

n(1 − π) ≥ 10; 35(.75) ≥ 10; 26.25 ≥ 10 (5–23) 2. Obtain standard deviation of σ of sampling distribution. √ √ √ σ = (P ∗ (1 − P)/n = .25 ∗ (1 − .25)/35 = .0054 = .073 (5–24) 3. Compute Test Statistic:

p = 60% surveyed number of people who use trust different clouds with different data

P = 25% many surveys show significantly increased cloud data usage regardless of sensitivity

z = (p − P)/σ = (.60 − .25)/.073 = .35/.073 = 4.79 (5–25) 4. Compute P-value: We have a right tailed test since we are saying greater than 25%. P(z > 4.79) = 1 based off standard normal table (5–26) 5. Compare P to α using rule:

IfP ≤ α, rejectH0.IfP > α, fail to rejectH0

Since P = 1 > .05(α), we cannot reject the null hypothesis. (5–27) The observed sample certainly supports the hypothesis, but the test statistic z along with the P value showed us that it is not enough to reject the null hypothesis. 5.6.5 Hypothesis 4

• π = the true proportion of cloud consumers that use SLAs to gauge cloud trustworthi- ness. • Null: Much of the research we surveyed uses the SLA as a basis or major contributor to consumer trust. The SLA is outdated, but there is little research on its decline in importance. Therefore, conventional wisdom suggests 25% or more of cloud consumers believe Service Level Agreements are very useful to gauge cloud trustworthiness • Alternate: We believe less than 25% of cloud consumers believe Service Level Agree- ments are useful to gauge cloud trustworthiness anymore.

107 Table 5-9. Q16. I have read service level agreements SLAs I have read Average percentage 0 48.57% 1 11.43% 2 0.0% More than 2 17.14% What’s a service level agreement 22.86% 1. Pre-conditions to use z-score. The first one is not satisfied, however these the rules are conservative rules of thumb, we continue to use the normal distribution. To assume a normal distribution, the following equations must be satisfied: nπ ≥ 10 and n(1 − π) ≥ 10 where π is the proportion in the null hypothesis and n is the number of samples. Using the value 10 is a conservative rule of thumb, not a hard requirement. According to [91], if π = .5, the z-test can be used for n down to 20. Likewise for a π closer to 0 or 1, n should be much larger; at least 200.

n(π) ≥ 10; 35(.25) ≥ 10; 8.75  10

n(1 − π) ≥ 10; 35(.75) ≥ 10; 26.25 ≥ 10 (5–28) 2. Obtain standard deviation of σ of sampling distribution. √ √ √ σ = (P ∗ (1 − P)/n = .25 ∗ (1 − .25)/35 = .0054 = .073 (5–29) 3. Compute Test Statistic:

p = 72% surveyed number of people who have never read an SLA

P = 25% much research use the SLA as the basis of trustworthiness still

z = (p − P)/σ = (.72 − .25)/.073 = .47/.073 = 6.43 (5–30) 4. Compute P-value: We have a left tailed test since we are saying less than 25%. P(z < 6.43) = 1 based off standard normal tableP = 1 (5–31) 5. Compare P to α using rule:

If P ≤ α, reject H0, if P > α, fail to reject H0

Since P = 1 > .05(α), we cannot reject the null hypothesis. (5–32) The observed sample certainly supports the hypothesis, but the test statistic z along with the P value showed us that it is not enough to reject the null hypothesis for our sample size.

108 5.7 Summary

From our data, just about everyone surveyed ordered from Amazon and used Facebook. Over 50% of the respondents were from the 18 to 24 age range. The most surprising result was that the location of data was not a top concern to cloud users, although a large majority expressed concern over data location. However the third party access along with confidentiality of cloud data ranked high. We also learned from our survey that long transitive trust chains (i.e., A trusts B, B trusts C, C trusts D, therefore A trusts D) are not used in practice. Based on the survey results, size, and the hypothesis testing, we can make educated guesses about a good direction to head. Knowledge and control over data movement impor- tant to a vast majority of those surveyed. The most surprising result was that familiarity with the recommender is not most significant factor when decided to use a cloud. Furthermore perceived expertise may play a larger role than the researchers believed. Lastly, we determined that trust transitivity drops off very quickly (i.e., most people do not trust a friend’s friend if they do not know them personally). Our target number of responses was roughly 200. Due to the limited distribution allowed by the survey, and a response rate of approximately 2%, we would like to try again at a later date. We received interesting suggestions for new policies that we plan to explore that are detailed in our raw survey results in Appendix B.

109 CHAPTER 6 GOAL 2 - ORCON CONSUMER POLICY MODEL FOR DATA MOVEMENT Since we are focusing on consumer data movement policies, we took inspiration from the Originator Control (ORCON) access control model as a guiding model behind our framework. ORCON is an access model aimed for use with government data. It states that an ‘originator’ must be queried in cases related to data access. 6.1 ORCON Policy Model Overview

At the organizational level, ORCON can be described as follows [37]. If S is a pool of subjects, O is a set of objects, subject s ∈ S, object o ∈ O, then s marks o as ORCON on behalf of organization X (note: in this instance the organization is the originator). X allows o to be disclosed to subjects acting on behalf of organization Y with the following restrictions: 1. o cannot be released to subjects acting on behalf of other organizations without X ’s permission; and 2. Any copy of o must have the same restrictions placed upon it. We use ORCON as a design philosophy for a framework and approach that ensures the creator can express her desires to any cloud. In our ORCON framework, X is the consumer (individual or organization), o is the data set, and Y is a cloud organization. How does ORCON compare to MAC, DAC, and RBAC? In discretionary access control (DAC), the file owner determines and is able to change access permissions. If any user is able to make a copy of a file, then they become the owner of that copy. The owner can change all permissions to files they own (including copies of others’ files they have made). In DAC, the owner of the file is not necessarily the creator. In roll-based access control (RBAC), access to an object is based on roles assigned by the organization. The organization controls the roles assigned to users and anyone that has a certain role gets a certain access. In mandatory access control (MAC), both subjects and objects have labels. Access control decisions are based on the labels of the subjects and objects and are governed by the rules of the system (which

110 may update labels dynamically and automatically). These labels are assigned at the time of creation; owners and users are only allowed to change data labels as the rules of the system permit. Typically DAC or RBAC are also imposed in addition to the existing MAC controls. RBAC, MAC, DAC are implemented on a per system basis. Building ORCON permissions into a system using these control systems is challenging. Pure DAC fails the ORCON test because the owner is not necessarily the creator and the owner can change permissions at will. RBAC aims mainly at the management level and different organizations have different management approaches. MAC is closer to ORCON and might simulate it by creating a separate category for each combination of object/owner/recipient across organizational boundaries. However, it only works within a system that can interpret the labels and has the same rules. In addition, the owner does not have control under MAC, MAC does not scale (there would be an explosion of categories), and lastly it also requires a centralized manager [37]. None of these work across organizational or system boundaries. Since we want policies to be applicable across organizations, ORCON is more appropriate in the cloud case rather than mandatory access control or discretionary access control; ORCON is a control policy that emphasizes the originator’s desires and it can be used across organizational boundaries. Likewise, our policy framework is seeking the same policy goals for consumer data within and across cloud boundaries. For this reason, we adopt the ORCON philosophy as a model for our policy framework. 6.2 Model

Our model can be divided into the inputs: subjects (i.e., clouds and consumers), and objects (i.e., datasets, tags) and outputs (actions, decisions, authorizations, and attestations). This chapter also includes a section on audit. 6.2.1 Elements of State: Clouds, Consumers, Datasets, Policies, and Tags

We need to precisely define the subjects and objects in our model at an abstract level to enable us to apply our model to a variety of situations. The indexes m, n, and o are the maximum number of clouds, consumers, and datasets in the system.

111 We represent the universe of clouds C as:

C = {C1, C2, ..., Cm} (6–1) We represent consumers E as:

E = {E1, E2, E3, ..., En} (6–2)

Each consumer, Ei ∈ E, has datasets Di ∪ Di = {Di1, Di2, ..., Dio }, D = Di (6–3) i Each consumer, Ei ∈ E, also has tags, tagi,j ∈ Tagsi . ∪ Tagsi = {tagi,1, tagi,2, ..., tagio }, Tags = Tagsi (6–4) i Each consumer, Ei ∈ E, has one policy σi and one metadata set µi , (which may be empty); and each dataset Di,j has exactly one tag, tagi,j , which may be empty. Both tags and metadata are strings that are uninterpreted by the model, but are interpreted by the appropriate σ. The set of metadata strings in the system is M. ∪ M = µi (6–5) i We represent the set of policies as :

 = {σ1, σ2, ..., σn} (6–6) In the next section, we define functions, including the policy function σ. 6.2.2 Functions

6.2.2.1 Tag function of a dataset

. With each dataset, we associate a unique tag. The tag function returns the tag associated with a dataset.

tag : D → Tags, where tag(Dij ) = tagij , (6–7)

The tag, tagij , may be a null string. 6.2.2.2 Location function of a dataset

. The function mapping from a dataset to its current location is represented by λ. For the sake of simplicity, a dataset is located in its entirety in exactly one cloud at a time. Refinements of this model might include the copy function at which point a dataset might be

112 in more than one cloud at a time. λ : D → C (6–8) 6.2.2.3 Owner function of a dataset

. The function mapping from a dataset to its owner is represented by ω. ω : D → E (6–9) 6.2.2.4 Mapping of consumer to metadata

. The function mapping a consumer to its metadata set is represented by µ.

µ : E → M, where µ(E⟩) = µ (6–10) 6.2.2.5 Mapping of consumer i to policy i

. The function mapping a consumer to its policy set is represented by π.

π : E → , where π(E⟩) = σ (6–11) 6.2.2.6 Policy Function

. Each consumer has its own policy function σ for data movement that takes input consisting of a destination cloud, consumer metadata, and the tag associated with the dataset. The function will produce a decision.

∀i , σi : C × M × tagi → DEC (6–12)

Where every consumer has their own policy function σi and the set of possible decision DEC is defined as: DEC = {granted, denied} (6–13) The default σ is the constant function that always returns “granted.” It is incumbent upon the consumer to ensure that the σ provided computes the correct answer given the correct input. 6.2.3 State

The system state consists of clouds C, consumers E, datasets D, datatags Tags, meta- data M, policies , and the location function λ. The set of all states is S = C × E × D × Tags × M ×  × λ (6–14)

System state will change over time, and the initial state is S0, where sets are empty and

functions are null. We use discrete time. There are a succession of states S0, S1, ... where

113 each state results from applying a single action to the previous state. The functions mapping a dataset to its owner, a dataset to its tag, a consumer to its metadata, and a consumer to its policy are either constant or implicit, and so are not explicitly part of the state. 6.2.4 Actions

There are seven components to our system state; each of these components may change. The set of clouds and the set of consumers can be increased by adding a cloud or adding a consumer. A consumer may add a new dataset, which also adds a new tag. Adding a consumer, adds the consumer’s metadata and policy. Sets C, E, D, and tags, are monotonic; they only increase in size and their elements do

not change in this model. The model will allow a consumer Ei to change its policy function

σi and its metadata µi , and a cloud may move a data-set hence changing λ. Therefore the model contains six actions. To reflect changes over discrete time, we will parameterize these component sets using discrete time t. So, for example, C(t) is the set of clouds at time t.

In order to reduce confusion, we may subscript time, as λt , or just state ’at time t’ if the subscript or argument may cause confusion. An action is feasible if its pre-conditions are met, and it is valid if its validity constraints are met 6.2.4.1 Add cloud C

. Without any validation, the system can add a new cloud C. The only pre-condition is that the new cloud is not already in the system at time t. C(t + 1) = C(t) ∪ {C}, requires C ∈/ C(t) (6–15) 6.2.4.2 Add consumer E

. Without any validation, the system can add a new consumer E. The only pre-condition is that the new consumer is not already in the system at time t. E(t + 1) = E(t) ∪ {E}, requires E ∈/ E(t) (6–16)

6.2.4.3 Consumer Ei add data set D with tag TAG at cloud C

. Validation: only consumer Ei may add a data-set D to its set of data-sets. The pre-

conditions are: Ei ∈ E(t), D ∈/ D(t) , TAG ∈/ Tags(t), and C ∈ C(t).

114 The post conditions results of this action are that:

Di (t + 1) = Di (t) ∪ {D} (6–17)

λt+1(D) = C (6–18)

tagt (D) = TAG (6–19) ω(D) = E (6–20)

Tagsi (t + 1) = Tagsi (t) ∪ {TAG} (6–21)

′ 6.2.4.4 Consumer Ei modify metadata from µ to µ

. Consumer E changes metadata to µ′. Validation: Only E can change E’s metadata. Post condition: µ(E) = µ′ at time t + 1 (6–22) 6.2.4.5 Consumer E modify policy to σ′

′ . Consumer Ei may change its policy function σi to σi . Validation: Only E can modify E’s policy. Pre-condition: E exists at time t, E ∈ E at time t. σ(E) = σ′ at time t+1 (6–23)

′ 6.2.4.6 Move dataset Dij from C to C

. A cloud storing a dataset may move it to another cloud. Pre-condition: D ∈ D, C ′ ∈ C, and λt (D) = C, ω(Di,j ) = Ei , σi = σ(Ei ), µ = µ(Ei ) at time t. Given that ω(D) = Ei , that

µ(Ei ) = µi at time t, and that σ(Ei ) = σi at time t. Validation: the current location cloud C

Ei request the action and the move′ must be allowed by ’s current policy: σi (C , µi , tag(D)) = granted, at time t (6–24) Result: ′ λt+1(D) = C (6–25) The model assumes that the requestor of any action can be authenticated, and requires it for the last four actions; the means for authentication is outside of the model, but is assumed to be handled during the execution of the first two actions.

115 6.2.5 Valid State

Applying a feasible action α to a state S produces a new state S ′, as specified by the post-conditions of α. No other parts of the state change, only the parts specified in the post-conditions. This is represented by the δ transition function:   S ′, if α with parameters is feasible (S, , parameters) = δ α  (6–26)  S, otherwise

S0, in which all sets are empty and all functions are null, is a valid state. If S is valid and α is valid transition, then S ′ = δ(S, α, parameters) is also valid (meaning that the validity constraints such as requestor authentication or policy compliance are met). The valid states of the system are those that can be reached by feasible and valid actions from S0. Note that the state sets are monotonic in that no clouds, consumers, or datasets are ever destroyed. However, a consumers policy function or metadata may change over time, and a datasets location may change over time. 6.2.6 Model Summary

This model allows for consumers to set and change policy using policy metadata and policy interpretation function. The system itself does not interpret policy (or even assert that a given σ works correctly) - that is up to the consumer to provide correctly. The model only states that only the consumer can change their own policy (σ and µ), and that a cloud is only allowed to move a dataset if it is allowed by current consumer policy for that dataset. 6.3 Specific Policies

Policies are specific ways to make decisions based on the information provided and abstracted in our model. The particular policy that is implemented by a smart contract may have to interpret the tag that is sent to the smart contract (note in our prototype implementation the tag is merely a string provided by the consumer). We envision different types of policies:

• Personalized Whitelists that contain trusted clouds (as defined by consumer preference)

116 • Consumer records for VM co-location • Cloud Third-Party Associations • Cloud ratings from CSA Star Registry Types of data include:

• Virtual Machines • Containers • Passwords • Health data • Consumer personal info (e.g., social security number, driver’s license) • Consumer purchasing data (e.g., credit information) • Company proprietary (e.g., formula for food / beverage / trade secrets) 6.4 Authorizations, Attestations, and Audit

An authorization is the permission to move data. We define authorizations below. A side effect of the authorization AUT H is an attestation AT T . An attestation is produced as a direct result of executing the function (i.e., policy). AUT H = DECX AT T (6–27) At some time t, there is an attestation AT T , where

AT T ∈ {Csource × Cdest × D × DEC × t} (6–28) Attestations are evidence regarding the state of something. Attestations do not have to be hardware-based, though the term was used in trusted computing to mean from the trusted platform module. To be able to detect data movement events across clouds the attestations cannot be tied to a particular system or cloud. We introduce the idea of negative verifications here. We define negative verifications as verifying that something has not occurred. If the attestation shows data should not have been moved and it is later determined that they have been moved, this can be an indicator of untrustworthiness. If there are no attestations, the cloud is either not using the consumer’s policies or is not moving data. Audit is the process of verifying that records match real life. The most familiar form of audit is the ”tax audit”. On a computer system, auditing is similar; the logs provide evidence to ensure there is no unusual activity. The definition of ”unusual activity” differs between organizations, but generally includes access or attempted access by someone to something

117 that was not supposed to occur. Audit is the action of a consumer checking her logs. In this manner our mechanism allows this, but only from the standpoint of providing requested information. 6.5 Summary

This a location-centric framework, the cloud must query the policies prior to taking any action that affects the location our data without the consumer’s knowledge. Our ORCON- like cloud policy framework (so named because it is consumer or originator focused, not SLA focused) uses decentralized policies and a mechanism to allow policy expression for the consumer and policy evaluation for the cloud. We also introduced the notion of negative verifications. If the attestation shows data should not have been moved and it is later determined that they have been moved, this can serve as a negative verification. We use blockchains; blockchains provide a decentralized ledger with high integrity. We use smart contracts to implement our ORCON-inspired data movement policies and the blockchain to store the policies. A cloud does not have to interpret these policies, the cloud just needs to consult the smart contract to get an answer. This enables a functional approach to access control decisions that support consistent policy interpretation for any cloud provider. We discuss this approach in the next section covering our Control Your CLoud Operations (CYCLOPS) decentralized application.

118 CHAPTER 7 GOAL 3 - CYCLOPS DECENTRALIZED APPLICATION WITH WHITELIST, DATA TRACKING, AND ATTESTATION We present CYCLOPS: Control Your CLoud OPerationS, a consumer policy decentralized application and evolution of our pilot InterCloud system in Figure 7-1.

Figure 7-1. CYCLOPS DApp menu A decentralized application (DApp) is a system of smart contracts interfaced and accessed by a web-based portal. CYCLOPS (formerly titled InterCloud) allows you to control your cloud data operations using consumer specified policies encoded in smart contracts. These smart contracts are accessible by clouds and consumers via our DAPP so that any cloud can consult a consumer’s policy. Users create policies by sending transactions via this portal using their Ethereum blockchain address and password. The user must obtain an address and password using an Ethereum compatible wallet application prior to using the DApp. When it is first deployed, a smart contract is assigned its own unique blockchain address. The smart contract addresses

Material from this Chapter is from “InterCloud: A Data Movement Policy DApp for Man- aging Trust in the Cloud”and “Control Your CLoud OPerationS (CYCLOPS): A Consumer Policy Cloud Trust DApp”

119 and GUIs are coded within our DApp using an HTML programming language so that the cloud need not search for various policy addresses, merely use the DApp. 7.1 Overview - How Does the System Work?

Consumers’ policies will be stored in Ethereum smart contracts, which are accessible via a DApp (i.e., a known website) or using an Ethereum blockchain browser (i.e., if the address of the deployed contract is known). Each participating cloud and consumer must have an Ethereum public addresses and access to a client or an Ethereum wallet. The peer-to-peer (P2P) network holds the authoritative copy of the policy - replicated on each peer’s blockchain. Transactions to the policies are verified by consensus in the P2P network by miners. Each contract incurs a one-time transaction cost for deployment. A transaction is required to change any policy data thereby requiring payment. This cost is paid by the cloud, the consumer, or a combination for each policy. The blockchain system provides high integrity storage of the policy by being decentralized and is replicated on each peer; no peer can change it on its own. The smart contracts allow evaluation of the policy. The DApp will be managed and located within a trusted organization who can manage it and provide access to clouds and consumers. When a cloud wants to take an action on consumer data (e.g., migration to another cloud), the cloud consults the smart contract system for ‘permission’. The cloud will request authorization by sending a transaction. This action should be performed by the cloud prior to the cloud performing actions on a user’s datum that falls within the scope of pre-defined policies. This permission will be within the transaction receipt logs presented to the cloud after the transaction is mined (up to 15 minutes on the Ethereum main network). These logs are also available on demand at any time as consumer attestations. Consumers will be able to set up their own policies via the DApp, which forwards transac- tions to Ethereum functions within smart contracts. The cloud will send a transaction to our system of smart contracts via a special function to obtain authorization to migrate data based on a whitelist or datum whitelist or future policy processing.

120 The high-level architecture of the system is in Figure 7-2. The figure shows the interaction between consumer, clouds, and a P2P network containing the blockchain. The DApp can be hosted at any Internet accessible server. The cloud needs access to the DApp or an Ethereum node.

Figure 7-2. Blockchain, user, and cloud interaction In our system, each type of policy will be represented by a smart contract. A gateway contract will change the least over time and can act as a pointer to other contracts (policies). The DApp GUIs are programmed with the correct blockchain address; we can add new policies over time. Consumer policies are stored via transactions. The data consist of a consumer Ethereum address, tag identifiers, and lists of clouds. The smart contract logic stores the data and evaluates the policies via function evaluation. Every smart contract deployed to the network is propagated to every client in the Ethereum network that has a copy of the full blockchain. One of the key attributes of our system is that the cloud provider can functionally evaluate policies based solely on the input the cloud provides, without needing to see parts of the policy that are not applicable or are private to the consumer. Some of the kinds of preferences we believe consumers might want to express via policies are:

121 • Personalized whitelists that contain trusted clouds (as defined by consumer preference) • Consumer records for VM co-location • Cloud third-party associations • Cloud ratings from CSA Star Registry To use the DApp, clouds must obtain a free Ethereum address and must purchase some ether (many websites provide this). Clouds must publish their Ethereum blockchain address on their website for consumers to verify. A consumer can obtain a free Ethereum address from any appropriate wallet app (e.g., Metamask, Mist, Ethereum.org). We recommend that the consumer install the Metamask browser plug-in and set up their account. On some networks the Metamask plug-in is considered mining malware even though the plug-in is merely a wallet. Consumers must add ether (Ethereum digital currency) to their wallet (this can be done via numerous currency conversion websites or by joining a and mining ether). Converting directly from USD to ether is regulated by the banking industry and hence will require paperwork. The consumer must start with a very small amount of ether to create a policy. Our testing cost roughly a hundred-thousandth of an ether (about $.03) to create a policy. All transactions will take at most 15 minutes to be processed by the Ethereum main-net. Some organization (e.g., Cloud Security Alliance) is hosting the DApp, implementing the main smart contract and policies, and has deployed the smart contracts to live Ethereum network (pays for it). All smart contracts deployed to the main-net must be vetted and scrutinized on test networks prior to live deployment. 7.2 Pilot Decentralized Application

Our Cyclops DApp was created in two phases. The first phase was the pilot which included the baseline GUI, and initial smart contracts. The fundamental baseline was created during summer 2018 and resulted in the first publication. The second phase included the migration of the baseline to a web server, and an additional tagging policy and GUI upgrades for speed.

122 7.2.1 Decentralized Application GUI Design

React is a Javascript library used in building web user interfaces. Web3.js is an Ethereum Javascript application programming interface (API). This API is used to communicate with the smart contracts from within the web environment. Our DApp uses a React framework combined with web3.js. The DApp makes extensive use of the web3 API to interact with the blockchain. The two most used commands are the ‘call’ and the ‘transaction’. A transaction is required if a user wishes to store data or change state, which costs money. Within the transaction, a user’s source account is charged ether. A call is free because it does not change blockchain state; a call merely reads state. 7.2.2 Smart Contract Design

Our design is flexible enough to add new policies. A policy has one Ethereum address. Each consumer can have one or more policies. This allows: 1) the cloud to locate the con- sumer’s policy address via a mapping (i.e., consumer must have a public address registered in Ethereum), and 2) for updates to smart contracts (since a smart contract is immutable, partial updates are not possible, so the whole contract must be replaced). The high-level of the smart contract framework is shown in Figure 7-3. Individual policy addresses can be located by referencing a main policy (a policy which keeps track of participating consumers and clouds). We explored utilizing one smart contract per consumer. One smart contract per consumer would cost approximately $1.50 per smart contract per consumer [74]. This approach is feasible for consumers who want their own smart contract policy. Our main design goal is to reduce costs; one smart contract per consumer for the same kind of policy would lead to duplicate code at best, and increased costs and incorrect code at worst. Flexibility is another goal, our design allows expansion in the event consumers want to create their own unique policies. We store consumer policy data in the same smart contract to relieve the consumer of the added cost of having their own smart contract. Our system can keep track of consumers with their associated policies, and 2) policy contracts with their Ethereum address. Our design includes the following:

123 Figure 7-3. Smart contract policy framework

• A DApp front end • A main contract that contains: 1) a listing of each consumer’s policy that will not change as often, 2) consumer public address mapping to policies, 3) policy mapping to policy address • One smart contract per policy type: 1) data structures that store consumer policy meta- data on the blockchain, 2) smart contracts’ control structures that make decisions based on policy type and consumer meta-data. 7.2.2.1 Clouds and consumers

All participants have Ethereum addresses. Consumers are identified only by an Ethereum address, each cloud is identified by its name which is stored along with its Ethereum address when signing up to use the system. 7.2.2.2 Policies

The system can accept multiple policies. We can track the name and the Ethereum address of each policy. Smart contract policies are deployed by an administrator of the system. But in the future, any consumer could develop and deploy her own policy. 7.2.2.3 Consumer meta-data location

Once a user adds meta-data into her policy, where on the blockchain are the data stored? The storage process is transparent to blockchain users. In Ethereum, the data are distributed across blocks as the consumer’s policy ages. To visualize how this works across blocks in the blockchain, see Figure 7-4 [8, 121].

124 Figure 7-4. Code and storage references The consumer’s policy meta-data are stored under the storageRoot (4th white box on the bottom of Figure 7-4). The storageRoot is a hash of the root node of a Merkle Patricia trie that contains the storage of the account [121]. Every block header in an Ethereum block contains three Merkle-Patricia trie roots. The three tries are: 1) the state trie, 2) the transaction trie, and 3) the receipt trie. Ethereum state is a key-value map; the state trie contains a mapping between addresses and account information. The state trie is where the smart contract code and consumer data are stored. The storage root is for consumer policy data and the black box is where the smart contract code is stored. Note the links between older blocks and new blocks. The links reference previous contract state that has not been changed between blocks. This saves space instead of replicating data [8]. The remaining tries in each block header are for additional accounting; the transaction trie for each block contains hashes of all the transactions, and the receipt trie contains hashes of all the receipts of each transaction.

125 7.2.3 Data Structures

We want to store lists of cloud names, addresses, and consumer policies. Mappings and arrays are the two most used data structures in the Solidity language, the most supported smart contract language for Ethereum. These structures have advantages and disadvantages. A mapping has no way to track the keys but, search is fast. An array does not require keys but, requires iteration. 7.2.3.1 Mappings

Mappings have an O(1) search time to find consumer data given an Ethereum address as the key. Mappings are great for ease of storage and checking values given the key. However, the keys must be tracked in the same way the items are tracked. 7.2.3.2 Arrays

Arrays are easy to manipulate, but when items are deleted, indices become unusable. Looping through arrays is cost intensive if the computation inside is expensive. Solidity does not support dynamic string arrays, so to avoid setting a limit on the list of clouds a consumer can have on a whitelist, we designed our own structure. Because Solidity does not yet support dynamic string arrays, we used a mapping of a structure. In Solidity, this idea is known as an “iterable mapping”, with varying degrees of difference in implementation. 7.2.3.3 Mapping of Struct

The Ethereum consumer addresses is the main key. The structure contains the maximum number of elements in the list and a mapping from an integer index to a string. This allows us to create, query and retrieve a dynamic list of strings (i.e., policy names and cloud names) without using a dynamic string array. Cloud policy names (or cloud names) are added via a structure and indexed by an integer. This structure is used throughout the design. Maxindex keeps track of the total number of policies/clouds/data per consumer. Maxindex is used as an upper bound when iterating through the structure. In the event of a deletion, the last element is moved to the spot of

126 the deleted item to preserve index consistency - maxindex is decremented. A hashmap allows iteration by integer over the strings. Figure 7-5 illustrates this mapping design pattern for the main contract.

Figure 7-5. Struct pattern for lists in solidity (strings and addresses) 7.2.4 Smart Contracts/Policies

All smart contracts have public addresses on the Ethereum blockchain. At present, policies are added to the system by the developer. The GUI indicates which policies are available based on the selection menu. 7.2.4.1 Smart contract technology decisions

Inter-contract calls. Contracts can send messages to other contracts, this is also termed a ‘call’. There are advantages and disadvantages to use ‘calls’. We were going to pursue this as a design feature to access smart contract addresses via the main contract, but inter-contract calls cost some money; albeit a very small amount. We decided to implement the main contract functionality within the DApp React code. The potential benefit is to call functions in well-established and tested code; this will also save the deployment of redundant code. However, using foreign code comes with a risk. It can be confusing to have contracts call other contracts and leads to re-entrancy risks unless there is a very good reason to do so. Using the ‘call’ command for inter-contract communication costs an additional 700 gas.

Vyper smart contract language. Vyper is a smart contract language with the look and feel of Python. Our preference was to use the Vyper language because it is designed with fewer ‘bells and whistles’ than Solidity.

127 Vyper is designed with simplicity in mind, but it is still experimental and does not support the string type at this time. We use the string type to represent cloud names (i.e., Amazon Inc, Google ). Ultimately we decided to continue with the Solidity programming language as it is the most widely supported smart contract language. 7.2.4.2 Main contract

The main contract stores the location of the policies and associates a single consumer with all their policy names. The system can accept multiple policies. We track the name and the Ethereum address of each policy. Table 7-1 summarizes the function methods in the main contract. Table 7-1. Main contract methods Function Description In Out Add policy for con- Consumer adds a pol- Policy name None sumer icy name to their policy list Remove consumer Removes the name of Ethereum Address, None policy the policy only Policy Name List consumer policies Lists policies con- Ethereum Address Policy names sumer presently uses Add cloud to system Registers a cloud Cloud Name, UFL, None name within the Ethereum Address system. Add policy for consumer. Consumer can add a policy name to their policy list (binds to their Ethereum address); no detailed policy information is added at this step.

List consumer policies. Returns the current list of policies for that consumer.

Remove consumer policy. The DApp executes the loop to match the name of the policy to the search string. Once found, the DAPP sends a transaction (note: this does not remove the actual policy data).

Add cloud to system. This button adds a cloud to the system. A representative for the cloud must have an Ethereum address and send this transaction. The transaction includes a cloud name, the cloud URL and the cloud Ethereum address.

128 Table 7-2. Whitelist contract methods Function Description In Out Add cloud to whitelist Adds a cloud name Cloud name None to the consumer’s whitelist Delete cloud from Deletes an existing Cloud name None whitelist cloud from a con- sumer’s whitelist Check if cloud on Cloud checks to see if Cloud name, con- Permission and attes- whitelist destination cloud is sumer Ethereum tation on whitelist address List my clouds Lists the consumer’s Ethereum address Cloud names clouds Delete consumer data Removes consumer Consumer Ethereum None data from white address list.

7.2.4.3 Whitelist policy contract

We implemented a whitelist policy in smart contract code and deployed it to an Ethereum test network. Our DApp: 1) allows the cloud to determine if the consumer has a policy, 2) allows the cloud to run a check against that policy, and 3) allows the consumer to update their policy meta-data. Each consumer can have one or more clouds on their whitelist. The whitelist data structure uses the same design pattern shown in Figure 7-6.

Figure 7-6. Whitelist design structure Add cloud to whitelist. Adds a cloud provider to the consumer’s whitelist

Delete cloud from whitelist. Deletes an existing cloud from a consumer’s whitelist

List my clouds. Uses DApp to iterate over clouds, call function to obtain each name based on index.

129 Delete consumer data. Removes consumer data from white list.

Check if cloud on whitelist. We designed an extra structure that allows us to check the whitelist in O(1) time. We use a mapping of address to a structure which contains another mapping (string to boolean). A unique consumer address is mapped to another mapping of a unique cloud which can be either a true or false value. All values in a mapping have a default value. The default value of a boolean is false, so the majority of our mapping contains uninitialized values; we only change the value to true as needed. Our data structure can be visualized as a T/F (i.e., 1/0) table where the entries default to F. Only each T is stored, see Table 7-3. Table 7-3. Check for authorization in O(1) time Consumer’s Ethereum address AWS HP 0x740ae91a9f19534ddfb173d5ee58851c8567a300 T 0xf0a6ab2e460236cf5cd6f89fc64d5c22d9645d7e T 0x8260dfd1e11d923a5395d679ba1c4c240257b314 T

This structure enables the smart contract to check the whitelist without performing looping operations to search for the matching cloud. Normally when iterating over a list of names, there is a requirement to iterate over the entire list. Iteration (looping) operations can be expensive in smart contract programming. This structure requires no iteration to compare string values until you find a match. The check consists of the computation of multiple hashes. A consumer address is hashed to a cloud name. The cloud name provides the second dimension hash whose final value is either true or false (i.e., consumerAddress → cloud → true |false). In this way the final hash value is checked much quicker than looping through an array. Each hash costs 30 gas (i.e., 60 for two) of computation regardless of the number of clouds on a list. (Note: in our tag policy, we use a three dimensional hash: consumerAddress → tag → cloud → true |false) This structure does require extra storage (two extra data structures), one to perform the check, one to keep shadow history (necessary to keep a history of the keys in order to zero out consumer data.

130 7.3 DApp Upgrade

This section of the research provides a description of phase two and the development and testing of the final CYCLOPS prototype. These are the upgrades we implemented:

• Attestations on demand from DApp menu: a button for a consumer to access all their attestations (past events) • Virtual Machine(VM) tags: demonstrated how VMs can be tagged using the OVF format and ovftool [25] • Whitelist for data tags in virtual machines: a new policy to track multiple whitelists for consumers; one per datum tag as specified on a virtual machine. Each consumer’s datum they wish to tag may have its own cloud whitelist 7.3.1 Upgrade 1 - Attestations on Demand

To attest to something is to make a claim. An attestation in our context is proof that an action took place. Whenever a smart contract policy is consulted by the cloud, an event (attestation) is triggered and stored in the blockchain logs. Attestations are stored in transaction receipts. In our design, attestations are created via Solidity events within the smart contract. Solidity events and transaction receipts are stored in Ethereum logs (which are cheaper than normal storage because the logs do not require consensus via mining). Any time one of our smart contracts is ‘checked’ or ‘asked’ for permission to proceed, an event is generated and stored in the transaction receipt (i.e., logs). The consumer has access to attestations on demand by requesting attestations from the GUI menu. Normal Ethereum storage fees for 32 bytes of data are 20,000 gas (625 gas per byte); events go into logs (a special part of the blockchain) which cost only 8 gas per byte of data. Storing 32 bytes of data at an average gasPrice of 2 gwei would cost a small fraction of a penny. Consumers can request all their transactions by supplying their Ethereum address in the appropriate screen in the DApp. It is possible for consumers to select specific dates, however this functionality is currently not built into the prototype. We could store attestations in the smart contract itself, which would allow for a little more consumer control over their own

131 Figure 7-7. Attestations from Rinkeby testing

attestations. We made a design choice to save money over added control and isolation. In addition, since this approach would store the attestations in regular blockchain storage, it would cost more than blockchain logs. Since logs do not require consensus, they are a cheaper form of storage. Figure 7-7 shows a screen capture of a sample attestation that the DApp provides when requested by a consumer using an Ethereum address. Searching through transactions for a certain value required a minor change to the smart contract events. We had to add the keyword “indexed” to the event definition within the smart contract code. This allows the event to be searchable by consumer Ethereum address in the past transaction receipts. Most attestations are TPM-based (for a single cloud server), therefore the attestations are typically stored in the cloud for simplicity. We not only broaden the definition of attestation, but also move the attestation storage into decentralized blockchain. If an Ethereum node is within the cloud, the cloud firewall rules might have to allow the proper protocols for peer to peer communication. Even if it is in the cloud, there is no chance of influence from the

132 target cloud because trust is obtained via decentralization of the peer to peer network. This effectively creates a service called: “Attestation-as-a-Service” using a blockchain. 7.3.2 Upgrade 2 - Tags to Identify a Virtual Machine

Geo-location is the process of determining the physical location of an object, such as a cloud computing server. Geo-Fencing defines geographical or virtual boundaries policies using GPS or RFID technologies, and geolocation attributes. Our research could be categorized as a type of geo-fencing. Asset-tagging/Geo-tagging generally refers to the tagging of hosts/asset- s/servers within a cloud environment. The asset tags are used to support SLA decisions. The consumer needs to know where the service is located or other information that can be provided by an asset tag. In this case, the asset is not expected to move frequently [123]. 7.3.2.1 Virtual machine security

Virtual machine (VM) security has been studied for several years. There are mechanisms to secure VMs in the cloud until the VM is ready to be executed at which point the image must be unencrypted to run. VMs in the cloud are commonly referred to as tenants. Tenant security has mostly been accomplished using hardware roots of trust. These roots of trust were discussed in Section 2.1.1 and create trusted chains from the TPM up to and into the tenant via the vTPM. We are concerned with being able to quickly distinguish several types of tenants owned by one or more consumer using VM tagging. We note a deficiency in cloud computing research on tagging of a tenant for policy adjudication decisions. Therefore we experimented and implemented VM tagging using the OVF format. 7.3.2.2 Open virtualization format

The Open Virtualization Format [14] provides a standardized virtual machine packaging protocol. This protocol can be used to include data tags (strings of characters) to virtual machines. An OVF package can come in the form of a compressed OVF appliance/archive with a .ova extension. The package can also be stored as a directory of files. The files in an OVF package include: 1) an OVF descriptor file with an extension .ovf (XML metadata), 2) an OVF manifest file with extension .mf that contains a hash of .vmdk and a hash of .ovf file, 3) 0 or 1

133 OVF certificate file(s) with extension .cert, 4) 0 or more disk image files (.vmdk), and 5) 0 or more additional resource files, such as ISO images [14]. The OVF descriptor file is an XML file that holds a variety of configuration metadata parameters, which is also extensible to insert customized policies. Listing 7.1 and 7.2 shows examples of a custom OVF descriptor. Listing 7.1. OVF example custom section

Useful␣info␣for␣incident␣tracking␣purposes Acme␣Corporation␣Official␣Build␣System 102876 07-31-2009

Listing 7.2. OVF example SteveTag12

A␣human-readable␣annotation SteveTag12

Despite the fact that the OVF format is supposed to be ‘highly’ customizable, to our surprise we found little research and few small scale tools that allowed us to create our own OVF fields. The annotation field was an obvious choice to insert a tag; the field is, however, typically used for other notes. 7.3.2.3 VMware OVF tool

We decided to use the VMWare OVF tool to demonstrate individual virtual machine tagging and signing. The OVF tool is an older tool, but it provides a “bare bones” approach to customizing VM OVF meta-data. VMware has been incorporating data tagging capability into managed infrastructures, VSphere in particular [26]. Due to the added cost of the software, we felt the OVF Tool was more than enough to demonstrate how virtual machines can be tagged and secured using Secure Socket Layer (SSL) protocol.

134 7.3.2.4 Virtual machine tags

There are a few options to add a virtual machine tag. We use VirtualBox to insert a tag in the OVF annotation field. The process is listed in Appendix E. Figure E-1 shows the virtual machine tag highlighted in blue from the VirtualBox virtual machine viewer. The OVF format also allows extensibility via customized sections or additions to existing sections. We were able to insert a customized section to support a tag, but are working on methods to extract that tag using a tool as future work. See Appendix E for the illustration of tagging using annotation section. Listing 7.3. Annotation section extended for data tag

Specifies␣an␣annotation␣for␣this␣virtual␣system This␣is␣an␣example␣of␣how␣a␣tag␣can␣be␣added SteveTag456

To secure the integrity of the tag, we use openssl. First the manifest file must be created. This is a hash of all important files: Listing 7.4. Create manifest file

C:\Users\Steve\Documents\OVF InterCloudTrust>openssl sha1 *.vmdk *.ovf > InterCloudTrust. ,→ mf

We create our own private key using openssl: Listing 7.5. Create RSA public/private key pair (selected elements listed)

C:\Users\Steve\Documents>openssl req −x509 −nodes −sha256 −days 365 −newkey :1024 −

,→ keyout steve.pem −out steve.pem

Generating a 1024 bit RSA private key writing new private key to ’steve.pem’ −−−−−

C:\Users\Steve\Documents>

135 Using OVFtool, we sign the OVF files using a previously created private key. By signing the OVF package using the VMware OVF Tool, we can attain integrity verification of the VM tag. We use OpenSSL to create a public/private key pair and certificate. In the same manner that a blockchain has high security/integrity, once signed the OVF files cannot be changed without invalidating the signature. To sign an OVF package, a .pem file is required. The .pem file contains a private key and certificate [123]. The overall process is explained in Chapter 4 of the OVF tool guide [25]. Listing 7.6. Sign the OVF files

"c:\Program Files\VMware\VMware OVF Tool\ovftool.exe" --privateKey=steve.pem ,→ InterCloudTrust.ovf MyVM-Signed.ovf Opening OVF source: InterCloudTrust.ovf The manifest validates Opening OVF target: MyVM-Signed.ovf Writing OVF package: MyVM-Signed.ovf Transfer Completed Completed successfully

To view a virtual machine tag, we supply the .ova file as the only argument to the ovftool to view the tag. We have suppressed supplementary information that is also normally printed to the screen. Listing 7.7. View the annotation tag using ovftool

C:\Users\Steve\Documents\OVF InterCloudTrust>"C:\Program Files\VMware\VMware OVF Tool\ ,→ ovftool" InterCloudTrust.ova OVF version: 1.0 VirtualApp: false Name: InterCloudTrust Annotation: SteveData123 // Steve Attestation TAG

Probe mode is used to validate the authenticity of the OVF package [25]. If the OVF certificate is present, OVF Tool always verifies if the signature matches the SHA digest of the files and tests the authenticity of the certificate.

136 Listing 7.8. Ovftool probe mode

C:\Users\Steve\Documents\OVF InterCloudTrust>"C:\Program Files\VMware\VMware OVF Tool\ ,→ ovftool" SignedInterCloudTrust.ovf OVF version: 1.0 VirtualApp: false Name: InterCloudTrust Annotation: SteveTag123 Download Size: 0 bytes Deployment Sizes: Flat disks: 0 bytes Sparse disks: Unknown Networks: Name: NAT Description: Logical network used by this appliance.

Virtual Machines: Name: InterCloudTrust Operating System: ubuntu64guest Virtual Hardware: Families: virtualbox-2.2 Number of CPUs: 1 Cores per socket: 1 Memory: 1024.00 MB Disks: Index: 0 Instance ID: 10 Capacity: 10.00 GB NICs: Adapter Type: E1000 Connection: NAT C:\Users\Steve\Documents\OVF InterCloudTrust>

137 7.3.3 Upgrade 3 - Whitelist Policy for Data Tags

The primary Solidity code structure used in the whitelist policy for data tags is shown in Figure 7-8. We are creating a list of lists. It is roughly the same design pattern shown in Figure 7-5 with an added layer of abstraction. The inner structure, contains another mapping of tags pointing to the final cloud whitelist structure. maxindex in both cases keeps track of the number of tags per consumer and clouds per tag. Each consumer Ethereum address can have multiple tags and each tag can have a list of one or more clouds which represents a tag’s whitelist.

Figure 7-8. Consumer tag structure The whitelisttag check data structure is more complex because it is 3-dimensional. The principle is the same as the check function for the whitelist: the default value of a boolean is false, so the majority of our mapping contains uninitialized values, we only change the value to true as needed. The data structure can be visualized as a T/F (i.e., 1/0) table where the entries default to F. Only each T is stored, see Table 7-3.

7.3.3.1 Add cloud to tag whitelist

Adds a cloud provider to the consumer’s whitelist for a datum tag 7.3.3.2 Delete cloud from tag whitelist

Deletes an existing cloud from a consumer’s tag whitelist

138 Table 7-4. Data whitelist contract methods Function Description In Out Add cloud Adds a cloud provider Ethereum address, None to the consumer’s Cloud, Tag whitelist for a datum tag Delete cloud Deletes an exist- Ethereum address, None ing cloud from a Cloud, Tag consumer’s tag whitelist Check data tag Checks for permis- Ethereum address, None sion to move data Cloud, Tag associated with tag List my tags Service for the con- Ethereum address One or more tags sumer to see all of her tags List clouds in tag Service for consumer Ethereum address, One or more clouds to see all clouds in Tag one tag

7.3.3.3 Check consumer data tag

The cloud will execute this function to obtain authority to proceed with a consumer’s datum. 7.3.3.4 List my tags

Uses DApp to iterate over tags, call function to obtain each tag name based on index. 7.3.3.5 List my clouds in my tag

Uses DApp to iterate over clouds provided a tag, call function to obtain each cloud name based on index. 7.4 Use Cases

We have several use cases to demonstrate the functionality of the system. To be able to use the cloud policies, each cloud participating in this policy framework must have access to our DApp front end or Ethereum blockchain explorer, a cloud blockchain address, and the appropriate smart contract addresses. In this model, it is assumed the consumers specify the clouds they trust.

139 7.4.1 Using Metamask to Send Transactions

Our smart contracts require digital currency (a fraction of a dollar at most) to store and execute the policies. Each participating cloud and consumer must have an Ethereum public address and access to a client or Ethereum wallet. For our tests and use cases, we use the Metamask browser plug-in. Metamask is an Ethereum wallet that stores your digital currency. You can use Metamask to access the Ethereum Mainnet or the test networks (Rinkeby and Ropsten). You can use the same Ethereum address across all Ethereum networks. However, you will have a different wallet on each (i.e., one address can have multiple wallets). Metamask needs to be set up in advance for the appropriate network with enough digital currency. Metamask is activated by the DApp when a transaction is sent. The Metamask plug-in creates a browser pop-up when a transaction is sent. The window supplies the user with a recommended gasPrice or the option to select their own gasPrice. After the gasPrice is selected, then consumer confirms the transaction. The higher the offered gasPrice, the faster the transaction will be accepted into the blockchain by the miners. The ether will then be deducted from the appropriate wallet when the transaction is mined. When executing DApp functions that involve a transaction, the sender of the transaction is not required to enter their identifying information. This information is stored in the Meta- mask wallet. If this information is required by a particular smart contract function, then the sender information is drawn from the Metamask wallet plug-in and supplied to the smart con- tract implicitly. In other words, the smart contract looks at the sender to identify the source cloud or the consumer identifying Ethereum address. 7.4.2 User and Cloud Sign Up

Consumers that want a cloud policy and clouds that wish to participate should sign up via the DApp portal shown previously in Figure 7-1. This process is shown in Figure 7-9. The policy is written as smart contract code. The consumer sends a transaction to the appropriate function of the smart contract to add trusted clouds to their whitelist and add the policy name to their policy list.

140 Figure 7-9. Policy sign-up 7.4.3 Add/Register Cloud with CYCLOPS

Clouds are added to the smart contract system from the Add Cloud button. Once the cloud does this, consumers may select any cloud that participates in the system. The consumer must verify the blockchain address at the cloud’s website to ensure they are interfacing with the correct cloud, see Figure 7-10.

Figure 7-10. Add cloud to system 7.4.4 Procedure for User to Add a Cloud to their Whitelist

If a consumer wants to express a whitelist policy, they may add a cloud to a whitelist. Merely adding their first cloud creates the whitelist. No list exists for the consumer prior to

141 executing this the first time. Subsequent additions add to their list. Users may only add one cloud at a time. 1. Consumer goes to DApp and adds policy name along with their Ethereum address using add Policy button

2. Consumer returns to DApp menu and selects “Whitelist Policy” button

3. Consumer selects cloud for whitelist from dropdown menu listing participating clouds

4. If desired, the consumer can verify the validity of cloud participating using the listed Ethereum address and website, enters their Ethereum address, and sends a transaction

5. Metamask will activate and ask the consumer to choose a gasPrice, then select ”Con- firm Transaction”

6. Once the transaction goes through, the consumer can verify by listing clouds on their whitelist.

7. Consumer may stop or add more trusted clouds 7.4.5 Delete Cloud from Whitelist

This functionality means that the consumer no longer trusts the cloud; the cloud is no longer on their whitelist. The smart contract searches for the cloud that matches their requested deletion. The CYCLOPS smart contract is doing a string match search. 1. Consumer navigates to Whitelist from DApp main menu

2. Consumer selects ”Delete Cloud for consumer”

3. Consumer enters cloud from their list and their Ethereum address

4. Consumer Sends Transaction 7.4.6 List Approved Cloud on Whitelist

A consumer might forget the clouds on their whitelist. This is basic functionality to allow the consumer to list their current whitelist. 1. Consumer navigates to Whitelist from DApp main menu

2. Consumer enters their Ethereum address

3. Consumer selects ”List All Approved consumer Clouds”

142 4. No transaction required, this is a ‘call’ (does not change state, a transaction does). 7.4.7 Delete Consumer from Whitelist

Consumer wants to remove all data from whitelist. This command will delete all consumer meta-data. 1. Consumer navigates to Whitelist from DApp main menu

2. Consumer enters their Ethereum address

3. Consumer selects ”Delete consumer from Whitelist”

4. Transaction is required, this will change the state of the blockchain 7.4.8 List Consumer Policies

How does the cloud know about consumer policies? The cloud and the consumer may list current consumer policies. This shows a list of currently active policies for a consumer. This is merely a list of policies, the consumer might add a policy to the list that does not exist. It is the consumer’s responsibility to ensure this list is accurate. 1. Consumer returns to DApp menu and selects “List Consumer Policies”

2. Consumer enters their Ethereum address (There is no cost at this step)

3. Consumer selects ”List”

4. Policy names are listed 7.4.9 User Wants to See Attestations

Consumer would like to see existing attestations. At present this command shows all attestations that exist for consumer. For excessive checks, this could return a lot of data. 1. Consumer returns to DApp menu and selects “Attestation”

2. Consumer enters their Ethereum address. (There is no cost at this step. Attestations are stored inside transaction receipts which are searchable, but a consumer can search anyone’s attestation just by providing Ethereum address)

3. Attestations are listed

143 7.4.10 Cloud Check Consumer Whitelist

Since our policies are whitelists, the cloud supplies the consumer’s Ethereum address and destination. At present, there are no restrictions on who might execute this command. 1. Cloud navigates to Whitelist from DApp main menu

2. Cloud selects ”Check”

3. Cloud enters consumer Ethereum address and destination cloud for VM migration

4. Cloud sends transaction 7.4.11 Whitelisttag Policy Actions

For a consumer, to sign up for a whitelist for their tagged data, the procedure is nearly identical to the whitelist. The only difference is that the consumer navigates to a different DApp policy button that allows them to supply an additional datatag (i.e., a string of charac- ters such as SteveTag123). There is no logical linkage between the OVF tag and the tag used in the smart contracts. The OVF might have a tag, but if the consumer does not include the tag in the smart contract policy, the tag is meaningless to the smart contract. Smart contracts receive data only when initiated through a transaction from a valid Ethereum address or wallet. Since the tag is a string with any allowable printable character, the presence of a tag itself could signal the cloud of the existence of a policy. The policy data however is self-contained and isolated to the blockchain. The cloud has the responsibility to check for the existence of both the consumer tag and then the policy for that tag before taking an action related to that consumer’s data. Figure 7-11 shows the cloud dropdown menu while creating a whitelist for their tag.

144 Figure 7-11. Cloud is added and accessible to consumers 7.4.12 Cloud Provider Performs Check

When a cloud wants to take an action on consumer data (e.g., load balancing or VM migration to another cloud), the cloud will check the consumer’s Ethereum address on the DApp main menu by listing the applicable policies as described earlier. If the consumer has an applicable policy, the cloud goes to that policy and sends a transaction to the smart contract system for consumer authorization. Figure 7-12 shows the state diagram for a policy query. This transaction requires a destination cloud to be selected from the dropdown menu, the consumer Ethereum address, and the datatag. The return value is the authorization or denial, see Figure 7-13. The permission data are extracted from the transaction receipt and presented to the cloud making the request. The consumer can retrieve the authorization/denial, which serves as an ‘attestation’. The attestation will be created by a Solidity event, and viewed by the consumer on demand via the attestation request.

145 Figure 7-12. Cloud making a consumer policy query

Figure 7-13. Policy sign-up

7.5 Implementation and Testing

We have implemented a prototype DApp at https://www.stephenkirkman.org:3000. Our initial code is located on at: https://github.com/kirkmans/intercloudv2. We deployed

146 a whitelist smart contract on the Rinkeby test network.1 The bytecode for this smart contract can be viewed through any Ethereum blockchain explorer that can view the Rinkeby network. We recommend https://rinkeby.etherscan.io and then loading the contract located at the following Ethereum address: 0x700eac86e1ebf9623ad2527f71b04a22ef9ff107. What is there is compiled bytecode, not the original source code. The following software was used during testing:

Table 7-5. Software and use Software Version Use 6.1.0 Package Manager for Javascript Next 5.1.4 Javascript Framework React component-based Javascript library nodejs 8.1.1 Javascript Runtime for Chrome Solidity 0.4.22 smart contract language Ganache-CLI 7.0.0-beta.0 test Ethereum blockchain Ganache GUI 1.2.1 test Ethereum blockchain Metamask browser plugin browser-based Ethereum wallet Solidity Online Compiler 0.4.24 realtime compiler plus gas estimator

7.5.1 Local Test Setup

For our test hardware, we used a Dell desktop with an Intel Core-i3 @ 3.4 Ghz with 16Gb RAM. We used the Solidity programming language for smart contracts and the Ganache test blockchain. We also tested using the Remix Ethereum browser. This is obtained by setting the environment to JavaScript VM, which provides a handful of default virtual accounts. Deploying to the Ethereum main-net is considered commercial deployment and is not within our current scope.2

1 https://www.rinkeby.io/#stats 2 We used multiple tools for testing including Infura. Infura is a service that provides access to the blockchain without requiring a full blockchain node. If you don’t use infura, you would have to host your own Ethereum node. For live deployment, an Ethereum node might be more appropriate. https://infura.io/ for testing.

147 7.5.2 Technology Decision - Mocha Test Framework Not Used

Mocha is a popular testing framework for smart contracts. We wanted to provide a brief explanation why we did not use Mocha. In short, the Mocha framework does the same kind of testing that can be done from the remix online compiler.3 The results are identical. Mocha tests are programs (automated tests), where remix is interactive. Mocha might be appropriate for highly repetitive tests where the expected result is the same. However, there is extensive set up required for the Mocha framework since the tests themselves are small programs. Remix performs the same tests with no setup and no software installation. Mocha provides test completion feedback to the output, but again, only if they are programmed correctly. Hence, our testing was accomplished using the Remix browser tool provided by Ethereum. 7.5.3 Rinkeby Testing and Results

All transactions cost gas the amount of which is used depends on the computations being run or size of code being deployed. We provide data on both the deployment costs and execution costs, but focus more on the execution costs as those will occur more often. We tested our smart contracts on the Ethereum Rinkeby network to provide us a closer gas estimate without using the Ethereum main-net. The Solidity “require” statement provides a layer of ”security” in real life. But that extra layer of security makes functional testing very challenging. For example you can include “require(msg.sender == customer, “Requester not Authorized”)” in any function; this statement protects the account from unauthorized storage of policy data. However, during testing, this step interferes with functional testing due to the number of accounts required for testing. Removing this step saved us only 40 gas per transaction using the Remix online compiler. Due to the difference magnitude, this cost has a negligible impact to actual gas costs which range from approximately 20,000 to 70,000 gas in our testing. Deploying the smart contract to the Rinkeby network is free, but not without hurdles. Many of the “faucets” (as they are called) that control the fake ether disbursement have a timeout mechanism or other restrictions and can only be used

3 https://remix.Ethereum.org

148 infrequently. Therefore, we are using just a single account (on which we have accumulated fake ether) to test functionality and gas cost. 7.5.4 Gas Cost Analysis and Scaling Costs

Each contract that is deployed as part of our system incurs a one-time transaction cost. Every time policy data are changed, or added, that is also a transaction and comes with a cost. This cost is paid by the cloud, the consumer, or a combination for each new policy. We deployed our ‘whitelist’ contract and had results for totalCost which indicate approximate gas costs obtained by using 3 different Ethereum compilers. We then analyzed each function cost using the same 3 gas cost estimation tools. Real world costs were then calculated based on the average gasPrice and current eth to USD exchange rate using exchange websites. These are the steps we ran to test the functionality of our whitelisttag smart contract and main smart contract: 1. Cloud: Add Amazon 2. Cloud: Add Google 3. Cloud: Add Microsoft 4. Consumer: Add policyname for User 5. Cloud/Consumer: List policy name for user 6. Consumer: Add SteveTag123 Amazon 7. Consumer: Add SteveTag123 Google 8. Consumer: Add SteveTag456 Microsoft 9. Consumer: List All Tags for user 10. Consumer: List Clouds for SteveTag123 11. Consumer: List Clouds for SteveTag456 12. Cloud: Check SteveTag123 Microsoft - denied 13. Cloud: Check SteveTag456 Microsoft - allowed

https://etherscan.io/gastracker, https://ethgasstation.info/ and, https://currencio.co/ eth/usd/

149 14. Consumer: Delete SteveTag456 Microsoft 15. Consumer: List Cloud for SteveTag456 16. Cloud: Check SteveTag456 MIcrosoft –denied 17. Consumer: Delete SteveTag456 for customer 18. Consumer: List all customer Tags 19. Consumer: List Attestations for customer

Table 7-6. CYCLOPS deployment costs Contract Deploy cost @ 4gwei Whitelisttag 2802428 Main 1274173

Table 7-7. CYCLOPSv1 average cost to deploy policies Variable Average deployment cost Gas @ 1,298,884 GasPrice: @ approx 4000000000 wei (4 gwei - average) totalCost =1298884 * 4000000000 totalCost =5,195,536,000,000,000 wei totalCost =0.005 eth totalCost Final =$ 0.71 @ $140 per eth, March 201

Table 7-8. CYCLOPSv1 average cost to execute functions Variable Average function cost Gas @ 27,000 GasPrice @ approx 2000000000 wei (2 gwei - average) totalCost =27000 * 2000000000 totalCost =54,000,000,000,000 wei totalCost =0.000054 eth totalCost Final =$.008 @ $140 per eth, March 2019

Table 7-9. Deploy gas estimate comparison Platform Smart contract Gas cost Remix virtual blockchain whitelist.sol 1,457,846 Ganache local blockchain network whitelist.sol 934,423 Rinkeby test network whitelist.sol 1,298,884

150 Table 7-10. WhiteList functions gas analysis. Gas analysis is for one cloud (deleteconsumer for the Rinkeby network had two clouds). Calls are free, but still subject to gas limits (i.e., it is possible to run out of gas locally for high EVM use) Function name Description Gas estimates (RemixVM,Ganache,Rinkeby) addCloudtoWhite Adds a cloud to a consumer’s white list 71,169/71,152/71,169 deleteCloudfromWhite Deletes a single cloud from a consumer’s white list 17,756/27,073/52,234 checkconsumerWhitelist The cloud checks the consumer’s whitelist prior to migration 28,042/28,276/28,042 listconsumerWhitelist List consumer’s whole whitelist no gas deleteconsumer consumer data is deleted from this policy 16,944/27,073/47,221 (2 clouds)

Table 7-11. WhiteListTag functions gas analysis. Calls are free, but still subject to gas limits (i.e., it is possible to run out of gas locally for high EVM use)

151 Function Name Description Gas Estimates (Ganache/Rinkeby) addCloudDatatoWhiteTag Adds a cloud a consumer’s tag white list 130,170 / 130,105 deleteCloudTagWhite Deletes a single cloud from a consumer’s white list 29,387/24,063 deleteTag Deletes a consumer’s tag and all the clouds stored 58,198/27,094 checkconsumerDataTag The cloud checks the consumer’s whitelist prior to action 30,885/31,088 listconsumerTags Consumer can list all their tags no gas listconsumerCloudPerTags Cloud may list all clouds on a Tag whitelist no gas

Table 7-12. Main functions gas analysis. Calls are free, but still subject to gas limits (i.e., it is possible to run out of gas locally for high EVM use) Function name Description Gas estimates (RemixVM/Rinkeby) AddCloud Registers a cloud to system 108,248 / 108,248 AddPolicyName for consumer Adds a policy name on consumer’s list 65,925 / 98,983 Our functions that do not use loops are all O(1) operations; we only store one cloud at a time. We compared the functions deleteCloudfromWhitelist and addCloudtoWhitelist in Figure 7-14. With the exception of the first addCloudtoWhitelist transaction, the total gas cost to add a single cloud is nearly constant (depends only on the length of the name of the cloud) regardless of how big the list is. We implemented the loop to find the cloud in the DApp, so the smart contract does not need to iterate. In our results, the first cloud to be added to an empty list takes 15,000 extra gas.4 As long as the list is not allowed to go empty, the gas cost depends just on the length of the name of the cloud being stored. Our average cost for a consumer to add a cloud to a whitelist was $.07. This number was derived using the gas utilization of 70,000 gas per function transaction and determining the total cost by estimating gasPrice, multiplying and converting the cost from Eth to dollars. These costs were calculated based on the average gasPrice and current Eth to USD exchange rates at https://etherscan.io/gastracker, https://ethgasstation.info/, and https://currencio.co/eth/usd/. Our average contract deployment cost was $1.50 (this would be a one-time cost of each policy type; multiple consumers can use the same smart contract policy but have an individual whitelist). Figure 7-14 shows our cost to maintain a whitelist for one consumer. Each vertical bar represents a single transaction. A single transaction from the consumer is required to add one cloud or delete one cloud. Note the first cloud to be added to an empty list consumes an extra 15,000 gas. 7.5.5 Smart Contract Security and Best Practices

We adhere to the “best practices” of smart contract coding. These focus on security and gas savings [27]. Listed below are common practices and warnings: 1. Re-entrancy: If contract (A) calls a contract (B), any control over the transfer of Ether is also transferred. This makes it possible for B to call back into A before this interaction is completed. We do not use contract to contract calls.

4 Per [121], it cost 20,000 gas to change a zero value to any non-zero value. But it only costs 5,000 gas on further modifications (i.e., to change it to another non-zero value.)

152 Add to whitelist 80,000 Delete from whitelist

60,000

40,000 Gas Used

20,000

0

AWS IBM Google Oracle CISCO Alibaba Microsoft SalesforceVMwareCloud List HPEnterprise Figure 7-14. Cost to add and delete cloud providers from a whitelist (i.e., add Amazon, then Google, etc...)

2. Avoid creating too many new contracts. At this time, we use one contract per policy. Consumers are free to create their own policy. 3. Avoid string type (in favor of bytes32) due to gas usage. We do use string type insead of bytes32. String type is slightly more expensive in terms of gas, but not prohibitively so. There is a savings to use bytes32, but the savings does not take into account the additional cost and complexity of converting bytes32 to a human readable type. 4. Gas limit and loops. Ethereum has a block gas limit. If the block gas limit is exceeded the transaction is denied and the money used for the transaction is lost. Most of our loop requirement involves searching for cloud names. In our DApp prototype, we illustrated how we could build loops into the DApp javascript side. The javascript would perform the loop and comparison, while the call to the smart contract retrieves the stored value. We did this for the whitelist contract, but not the whitelisttag contract. 5. Callstack depth. External function calls can fail any time because they exceed the maximum call stack of 1024. In such situations, Solidity throws an exception. Malicious actors might be able to force the call stack to a high value before they interact with your

153 contract. We do not use inter-contract calls and are not aware of any depth issues at this time. 6. Keep it small and modular and easily understandable. We follow this principle to the extent that usability is not impacted. 7. Use the checks-effects-interactions pattern. Prior to commercial deployment we will install the ‘require’ statement to ensure the sender has authorization to modify data that belong to the sender’s Ethereum address. 8. Include a fail-safe mode. It is a good idea to set up a manager in the smart contract. This is done through a constructor function. The constructor function is executed only once, when deployed. Using this, it is possible to assign a manager or more complex design pattern for management. This is appropriate prior to commercial deployment. 9. Ask for peer review. The more people examine a piece of code, the more issues are found. I have put my DApp on a public server https://www.stephenkirkman.org free for anyone to interact with. The smart contracts can either be hosted on a local blockchain or on the Rinkeby test network. 7.6 Discussion

7.6.1 What Are CYCLOPS’ Weaknesses?

The two main weaknesses of using any blockchain are 1) inaccurate data being stored on the blockchain, and 2) lack of privacy of data on a public blockchain. 7.6.1.1 Inaccurate data

What prevents a cloud provider from inserting inaccurate data in the blockchain for a consumer? In short, nothing would prevent this if they really wanted to. If the cloud (or malicious insider) desired, they could accomplish this. If it became known that unusual data started showing up in the consumer’s attestations, this would be a red flag for the consumer that the cloud has malicious insiders or is performing unusual activity. There are features that discourage this: 1) smart contract functions can be designed so that only authorized Ethereum addresses can get results from specific functions in the smart contract (i.e., functions can

154 ‘require’ the source of the transaction matches the data owner), 2) it costs money to send a transaction and insert inaccurate data, so an attacker has to pay to be malicious, and 3) there are negative reputation effects if a cloud or its representatives insert data that does not reflect reality. 7.6.1.2 Privacy

We use Ethereum addresses for the primary identifier of the consumer. This adds a level of pseudonymity from other users in a public blockchain. The cloud, on the other hand, needs to be able to match a consumer with an Ethereum address. The name/address bindings (and their protection) are the responsibility of the cloud. As always, the consumer must keep their Ethereum password safe. 7.6.2 Why Not Store the Real Cloud Data Off-Chain with Hashes on Chain?

Some projects (e.g., Medicalchain, Datawallet, Kochava) [56] are building their own private blockchains. Our belief is that private blockchains remove the strongest feature of the blockchain: high integrity. While it is true that storing data on a public blockchain does cost more per byte than typical cloud storage solutions, cloud storage is not our goal. Our goal is high integrity policies that are automatically executed and easy to access. The amount of data we propose to store depends strictly on the consumer’s policy. Instead of storing a small amount of data in one contract per consumer (which does not scale), we store consumer data within a single contract then distribute that data between many smart contracts. Lastly, since we are storing meta-data only, these data will not be changed every time a consumer accesses the cloud. 7.6.3 Ethereum Is Known to Support only 4-8 Transactions Per Second with a Hard Cap of 15TPS, How will That Scale or be Realistic?

Transaction scalability is currently a weakness of all blockchains. Bitcoin currently supports only 3-4 tps. Despite this, Bitcoin’s acceptance is growing. Ethereum supports a higher transaction per second (tps), but traditional credit card processing still handles 1,000 tps and higher. There are currently a number of Ethereum upgrades in various stages

155 of development to increase tps significantly. Proof of stake(PoS) via the Casper upgrade (currently in test) promises higher security, energy savings, and is a stepping stone to the use of sharding. Sharding has a claimed transaction rate target of nearly one million tps. The Casper upgrade is still in testing in 2019. Recall that the sending of transactions is dependent on a gasPrice (or bid). This gives some added flexibility for the consumer. If a consumer desires a quicker response (still below the hard cap), the consumer may pay for it by offering a higher gasPrice. Transactions with a higher gasPrice are processed quicker because the miners make more money. Conversely, the consumer might choose to save money and offer a lower gasPrice; if the offer is too low, however, the miners might reject the transaction. That being said, the actions covered by our suggested policies are actions the cloud might take without advance permission of the consumer (e.g., the incidents with both Equifax and Verizon). In addition, policy changes should expect a low tps. Policy checks would depend on how often the cloud is moving or manipulating consumer data. This has driven our research into policies that cover current cloud concerns; consumer driven requests (self-migration, virtual machine creation) would not be covered in these policies. Actions that would be covered include load-balancing, migration to other facilities, co-mingling of data. 7.6.4 What Good Is the System if it Cannot Tell Me Which Cloud is Trustworthy?

This system provides a key element of trust management that we see is missing from the current cloud environment. Our goal is not to tell consumers whom to trust; everyone has different opinions and experiences. We are providing a model and mechanism for the consumer to tell the cloud whom they trust and don’t trust. The model allows the cloud to query the policy without knowing the detailed policy data. The mechanism assists the consumer in determining who is trustworthy or not in part by 1) their willingness to use the policies, and 2) by using them correctly as evidenced by the attestation data. All the cloud has to agree to do is to check the policy. The cloud does

156 not have to know the policy in advance, just that one exists, where it is, and what inputs are required. The cloud can get this information from the DApp GUI text entry boxes we have provided, or even potentially from the data tag itself (currently under consideration for future work). As in all systems, it is at the cloud’s and consumer’s discretion to use the system. 7.6.5 How Will We Know if the Cloud Does Not Follow Our Policies?

There is no way to force a cloud to use this system or to respect authorization denials. If, for example, the system were used and the authorization request were denied, that denial would be stored in the blockchain logs. When the consumer requests their attestation, all requests can be presented. In effect, the cloud will be caught if real life does not match the security requests. If there are no data, the cloud might not be using the system. This makes a good case that only trustworthy clouds would use the system in the first place. This process, as we have discussed, is called ”auditing”. Indeed, nobody is forced to use new technologies like Bitcoin or the iPhone. Our system is designed to be an application that assists the consumer. Our application could be easily located within the cloud simply by integrating an Ethereum client. It is foreseeable that the system might be logically integrated with the cloud or with DNS servers that make routing decisions. Ours is a first step in using smart contracts to manage data using policies. 7.6.6 Why Would the Consumer Pay for Transactions?

The consumer should not have to pay that much. Our survey results do show willingness to pay for cloud policies if the cloud followed them. We have accomplished a few cost analyses for the code in terms of gas and shown in initial tests that transaction costs are minimal for a transaction. More testing is required. The system has high integrity and provides a means to express trust in a cloud. High integrity has often been equated with trust. The more complex the policies, the more ’code’ in the smart contract, the more expensive it will be. 7.6.7 How Do You Quantify Your Gains in Trust?

SLAs currently do not address consumer desires as much as they used to. From the survey and policy cost evidence, we believe consumers would use this system if it allowed

157 them to express policies they want. Our survey indicated that SLAs are not a good indicator of cloud trust, since few read them. Trust in the cloud is becoming interdisciplinary. We attempt quantify trust via our cloud model that has mathematical underpinnings which are also supported, in part, by our consumer survey. 7.6.8 How Does the Cloud/Consumers Know What Smart Contract Policies are Available?

There will be a primary public URL address for the DApp and a main front door contract with an Ethereum address. These will serve as an information dissemination point for all who wish to utilize it. We envision an entity generally regarded as unbiased (e.g., Cloud Security Alliance or similar consortium) to host this DApp as a tool to assist consumers to gain trust in the cloud. 7.6.9 How Can We Be Sure the Contracts Are Without Bugs?

A governing body would manage this “system of contracts”. New smart contracts can be sent to the blockchain via transactions. Contracts are accessed by their public Ethereum address; the bytecode is publicly visible. Any blockchain browser can view the available functions. Old addresses of contracts found to have bugs could be self-destructed to prevent usage. We plan to keep the contracts as small as possible. Our contracts have no need to store currency so there is little monetary gain to hack them. 7.6.10 How Can a Consumer Trust the Cloud To Evaluate Her External Policies?

How can you ever trust the cloud to do the right thing? That is what motivates our research. If they are willing to use these policies that will inspire trust. Attestations of those policies can also be stored in the blockchain. Attestations, see Section 7.3.1 will provide proof of action which in turn will increase confidence in the cloud and overall trust. 7.6.11 Can a Smart Contract Be Updated? What if it Needs to be Improved?

The solution to updating a consumer policy smart contract is to create new contracts, deploy the updated smart contract, log the new deployment address in the DApp, and finally disable the accounts of outdated contracts by one of several strategies, these may be accessed

158 since every smart contract has an address too. Ethereum implements a self destruct opcode that removes the contract bytecode, disables the address, and refunds any associated digital currency (i.e., stored within the smart contract). The digital currency is refunded to a specified address, but we see no need at this time to store digital currency in our smart contract system. The policy data in the old smart contract must be retrieved prior to deletion. Retrieving the data could be automated via a function, but it would increase smart contract complexity. At present, we do not implement this feature, but we do foresee the possibility. 7.6.12 If the Ethereum Blockchain Is Immutable, How Can You Delete from the Whitelist?

Blockchain state can be manipulated; any data within smart contract storage can be changed. However, the contents of transactions are immutable [24]. 7.6.13 Ether Is Required, How will an Average Consumer Obtain it?

Digital currency is becoming increasingly popular. It is easiest to exchange USD for ETH at coin exchanges. A consumer may find them by Googling ”buy ether” or similar terms. Mining is another option via a mining pool. A single person can mine ether on their own; but a high power GPU or mining pool is recommended. Part of our future work includes adding to our DApp easy access to ether conversion either through website links or some other mechanism. 7.7 Summary

We have presented CYCLOPS, a smart contract based policy DApp system. A DApp like this is appropriate for both clouds and consumers to access. We have shown that the real world costs are low. The motivation for this research is improving consumer trust in the cloud by providing consumer based data movement policies in a high integrity manner. CYCLOPS is a significant step towards making this practical.

159 CHAPTER 8 CONCLUSION AND FUTURE WORK 8.1 Conclusion

In the context of a single system, security has been defined by that system. In the cloud context, this is no longer the case. This is especially true when viewed from the perspective of Amazon’s updated security model that we provided at the start of this dissertation. Much of this responsibility is shifting back to the consumer. We will provide the consumer an opportunity and a mechanism to have policy enforcement outside the context of a single cloud. The cloud does not have to interpret the policy, the cloud just needs to consult the smart contract system to get an answer. Our data movement paradigm will help people trust the cloud if they can have more control and expression over where their data go and which third parties have access to their data. Our research adds another ring of defense-in-depth to the cloud. There are trust models to represent social trust and some have been applied to cloud trust. How accurate are these? None that we found capture the nuances of the cloud to the extent of how the value of our data really impacts our actions and trust. We developed a data movement policy model to reflect how policies will behave in the cloud and formulated a cloud trust model. It has been validated and will be refined based on survey results for use as a guidepost to identify areas that need more focus. Blockchains and smart contracts are having an impact in the decentralization space. Trustworthiness is inherently built into decentralization due to the lack of centralized control; in decentralized peer to peer networks, all participants follow the same protocol, but otherwise act independently. We provide and continue to refine our policy model (to be able to analyze our policies) and a cloud trust model (focusing on core CIA tenets). Ethereum, the most popular second generation blockchain, is garnering attention as they are planning to switch from proof-of-work to proof-of-stake in late 2019 as of the final edit of this dissertation. Ethereum.org has an updated roadmap that illustrates the plan to PoS. This

160 roadmap did not exist at the beginning of this research. PoS will consume much less energy for miners and make the typical 51% blockchain attack harder to perpetrate if the claimed 2/3 vote required to add a block is realized [39]. New smart contract based systems are competing to improve upon the deficiencies of their predecessors. The interest in this space is exploding and there are new ways to leverage these every day. 8.2 Future Work

Ultimate success would be a fully functional DApp that has a smart contract system running it from the Ethereum main-net blockchain. A DApp would provide access for both clouds and consumers alike. Ideally, the consumers should also be able to use a mobile DApp to access on the go. Consumers’ policies could be confidential. Much of this is out of scope for the present dissertation topic. 8.3 Features Out of Scope for Dissertation

8.3.1 Oracles

We originally envisioned the use of enumerated types to define our own class of clouds (e.g., AAA, AA, B). After further investigation, it was determined that enumerated types are best suited for smart contracts that make use of states. Enumerated types can be defined to represent certain types (e.g., red, blue, green) or states (.e.g, moving, stopped). However, they are converted to integers when the contract is compiled, and therefore they provide no added functionality. Instead, we would like to investigate an Oracle for a subsequent policy to achieve the same goal regarding classes of clouds. Oracles are smart contracts that provide a trusted link to the outside world. Smart contracts are inherently designed only to activate when they receive a transaction from the outside world. This presents a challenge in certain use cases that require data updates from the ‘real world’. For example, how does one ensure data from the real world is coming from a trusted source? A type of policy we looked at was using Cloud Security Alliance (CSA) ratings. Though not specifically within scope for this dissertation, we would like to implement this policy later and have received permission from CSA to do a proof-of-concept of a policy that using cloud ratings.

161 8.3.2 Confidentiality

By definition the blockchain is a public ledger. Pre-compiled smart contracts have been added to Ethereum late last year. They are supposed to assist with cryptographic functions. I am not sure if the new functionality will help to add as much confidentiality that I want to provide: full or partial policy confidentiality. 8.3.3 Exploration of the Recommendation Function F

This part of our research deserves more exploration. How do consumers recommendations get consolidated into a final recommendation. ρ = F ((σ ∗ NUMSOURCES), (ϵ ∗ EXPERTISE), (η ∗ FAMILIARITY )) ∈ [0.0..1.0] (8–1) 8.3.4 Data Tagging Other than a Virtual Machine

We have used OVF virtual machine format to insert labels (i.e., strings of characters) into the annotation field. These could be used for virtual machine tags. Data tagging at a finer granularity is presently out of scope, though desired. 8.3.5 Prevent Fake News Insertion into Blockchain

The smart contract might be written so that it allows only authorized Ethereum addresses to insert data into the blockchain. Fake news can be minimized, but not eliminated. 8.3.6 Ethereum Main-Net Testing and Deployment

The system needs wider testing than just one developer. It is currently available on www.stephenkirkman.org for any testing. The dissertation does not depend on testing like this, but making this available for wider testing prior to live deployment would be vital to be useful in a commercial sense. Main-net testing is not achievable due to the costs and due to the testing requirement. Since smart contracts are immutable, they cannot be changed. The goal is to inspire “real world” cloud provider usage. The smart contract system also needs to be tested extensively by more than one person prior to live deployment. 8.3.7 Negative Verifications

We are exploring the potential for negative verification in the cloud. Negative verification is the process of assuring an entity (in our case the cloud) is not doing what it is not supposed

162 to be doing. We are searching for alternatives not limited to classical auditing. Explore sampling of data points both internal and external to the cloud. Sampling of data could provide clues to how often unexpected events occur such as unauthorized data movement between clouds. 8.3.8 DApp Mechanism that Allows Obtaining Digital Currency

Include a mechanism or access link portal within the DApp to get digital currency. 8.3.9 Use of Swarm Ethereum Storage

Storage is always a concern on a blockchain. There are other systems in development that hope to enable distributed storage without taking up blockchain space. Swarm is a distributed platform. It is a layer in the web3 stack that allows for a redundant store for Ethereum. Swarm was in alpha and under active development at the start of this dissertation [16]. Swarm has since advanced and we are watching this platform in the event blockchain storage costs become too high. It might be an alternative mechanism for storing policies off chain. 8.4 Final Remarks

Our goal is not to tell consumers whom to trust; everyone has different opinions and experiences. There are few systems, if any, that allow the consumer to state their own policy, then have multiple clouds consult one policy. We provide a model and mechanism for the consumer to tell the cloud whom they trust and do not trust. The mechanism allows the cloud the query the policy without knowing what is in the policy. The mechanism assists the consumer in determining who is trustworthy or not in part by 1) the cloud’s willingness to use the policies, and 2) by using them correctly as evidenced by the attestation data, and audit it. Our solution adds a ring of defense-in-depth to the cloud. This system forms the basis of our research and a starting point for further exploration.

163 APPENDIX A CLOUD TRUST QUESTIONNAIRE For the purpose of this survey, the cloud includes companies that store customer data in one or more data centers or lease servers to other companies. We are not including general Internet and email. Even though typical stores, for example Kmart or Walmart, might have a web store online and save your credit card information, email, and address for marketing, they are not considered the cloud. They are merely another consumer of cloud services. Examples of cloud providers include Amazon, Microsoft, Google, Apple. Q1 I trust the cloud: (choose one)  Strongly agree  Somewhat agree  Neither agree nor disagree  Somewhat disagree  Strongly disagree Q2 I trust one cloud company more than others. (choose one)  Strongly agree  Somewhat agree  Neither agree nor disagree  Somewhat disagree  Strongly disagree Q3 Rank these cloud concerns in order of importance to you. (1 being most important) Location of my data Privacy/Confidentiality of my data My data availability Third party access (cloud partners) to my data Ease of use of cloud Q4 Do you order from Amazon?

164  Yes  No Q5 If you do not order from Amazon, please provide a couple reasons below:

Q6 Do you use Facebook?  Yes  No Q7 If you use Facebook, how often do you use it?  Daily  Weekly  Monthly  Rarely  Never Q8 I care if my cloud data are accessed by a third party without my knowledge  Strongly agree  Somewhat agree  Neither agree nor disagree  Somewhat disagree  Strongly disagree Q9 Please rank the types of data below from most sensitive to least sensitive. How likely you are to store the data in the cloud. (For example, 1 - I am very likely to store this in the cloud, 6 - I am least likely) Photos / Videos Social Comments Business, Professional Personal Information Coursework / School

165 Other Kind of Data Q10 I use virtual machines?  No  Occasionally  Frequently  What is a virtual machine Q11 At what detail would you like to know where your data resides? (choose one)  I do not care  In my country  In my state  In my county  In my city  A specific data center  On a specific server in a data center Q12 I care if my cloud data is moved without my knowledge (choose one)  I do not care  out of country  out of state  out of county  out of city  out of data center  out of a server in a data center Q13 Rank these potential sources of knowledge (for a recommendation) in the order that you would use it. (For example, when choosing a cloud or other service, to whom do you listen to first, second, etc ...) Personal Experience Unbiased Organization

166 Family Member Recommendation Large Group Rating Friend’s Recommendation Q14 This question concerns referred trust. When choosing a cloud, I would use a recommendation from . (choose one). For example the first one is the least trusting, the last is the most trusting.  A friend  A friend’s friend (here and below you do not know the individual)  A friend’s friend’s friend  A friend’s friend’s friend’s friend  A friend’s friend’s friend’s friend’s friend Q15 I trust more sensitive data to some clouds, but not others.  Strongly agree  Somewhat agree  Neither agree nor disagree  Somewhat disagree  Strongly disagree Q16 I have read Service Level Agreement (s) (choose one response)  0  1  2  More than 2  What’s a Service Level Agreement Q17 I would pay as a one-time charge to use a personalized cloud policy if all clouds followed it. (choose one)  nothing  10 cents

167  25 cents  50 cents  1 dollar  greater than 1 dollar Q18 Assuming the cloud followed your policies, what kind of cloud data policies would you like to see available for use?

Q19 Your age range (no subject under 18 should complete this survey. Thank you!)  18 - 24  25 - 34  35 - 44  45 - 54  55 - 64  65 - 74  75 - 84  85 or older Q20 Technical competence  Pro  Savvy  Average  Newbie  Not

168 APPENDIX B RAW SURVEY RESULTS - 35 TOTAL RESPONSES Percentages shown on the pie charts were rounded down or up to the nearest full per- centage for pie chart readability. In some cases, this resulted in totals not equal to 100% and caused an empty slice. This was done to make the pie chart more readable and to avoid misrepresenting the data. Actual data is in the appendices.

Table B-1. Q1. I trust the cloud Somewhat Agree Strongly Agree 17% Opinion Average % 51% Strongly Agree 17.14% 6% Somewhat Agree 51.43% Strongly Disagree Neither Agree nor Disagree 14.29% 11% 14% Somewhat Disagree 11.43% Somewhat Disagree Strongly Disagree 5.71% Neither Agree nor Disagree

Table B-2. Q2. I trust one cloud company Somewhat Agree Strongly Agree more than others 14% 51% Opinion Average % Strongly Agree 14.29% 9% Strongly Disagree Somewhat Agree 51.43% 3% 23% Neither Agree nor Disagree 22.86% Somewhat Disagree Somewhat Disagree 2.86%

Neither Agree nor Disagree Strongly Disagree 8.57%

1

2

3

4

5

Ranking (1 most important) 5 10 15 20 25 30 35 # Votes Location Privacy Availability Third-Party Ease-of-Use Figure B-1. Q3. Rank these cloud concerns (1 being most important) 169 Table B-3. Q3. Rank these cloud concerns (1 being most important) Concerns Average Rank Location of my data 3.82 Privacy and Confidentiality of my Data 1.65 My data availability 2.88 Third party access to my data (cloud partners) 2.79 Ease of use of the cloud 3.85

Table B-4. Q4. Do you order from Amazon?

Yes 100% Y/N Average % Y 100% N 0%

Table B-5. Q5. If you don’t order from Amazon, why not? No response 100% Response Reasons No response 100%

Yes Table B-6. Q6. Do you use Facebook?

89% Y/N % 11% Y 88.57% No N 11.43%

Daily Table B-7. Q7. If you use Facebook, how often do you use it?

50% FrequencyAverage percentage Daily 50.00% 3% Never Weekly 34.38% 34% 9% Monthy 3.13% 3% Rarely Rarely 9.38% Monthly Weekly Never 3.13%

170 Table B-8. Q8. I care if my cloud data are Strongly Agree accessed by a third-party without my knowledge Opinion Average % 74% Strongly Agree 74.29% 6% Strongly Disagree Somewhat Agree 14.29% 6% Neither Agree nor Disagree 0.0% 14% Somewhat Disagree Somewhat Disagree 5.71%

Somewhat Agree Strongly Disagree 5.71% 1 2 3 4 5 6

Ranking (1 most important) 5 10 15 20 25 30 35 # Votes Photos/Video Social Business/Prof Personal School Other Figure B-2. Q9. Please rank the types of data below from most sensitive to least sensitive. How likely are you to store the data in the cloud? Table B-9. Q9. Please rank the types of data below from most sensitive to least sensitive. How likely are you to store the data in the cloud? Data Types Average Rank Photos and Video 2.50 Social Comments 3.41 Business and Professional 3.29 Personal Information 3.24 Coursework or School 2.74 Other kinds of data 5.82

Occasionally

Table B-10. Q10. I use virtual machines? 63% Frequency Average % Occasionally 62.96% 15% Frequently 22.22% 22% What is a What is a virtual machine? 14.81% virtual machine? Frequently

171 Table B-11. Q11. At what detail would you In my state In my country like to know where your data resides? 12% In my city 30% Locale Average % 6% I do not care 18.18% 9% On a specific server In my country 30.30% 18% In my state 12.12% in a data center 24% I do not care In my city 6.06% At a specific data center 24.24% At a specific data center On a specific server in a data 9.09% center

Table B-12. Q12. I care if my cloud data is moved without my knowledge (choose one) Out of country I do not care Locale Average % 18% 44% I do not care 17.65% Out of country 44.12% 9% Out of a server Out of state 8.82% Out of county 0.0% 9% 18% in a data center 3% Out of city 2.94% Out of state Out of a data center Out of city Out of a data center 17.65% Out of a server in a data 8.82% center

1

2

3

4

5

Ranking (1 most important) 5 10 15 20 25 30 35 # Votes Personal Exper Unbiased Org Family Group Rating Friend

Figure B-3. Q13. Rank these potential sources of knowledge (for a recommendation) in the order that you would use it

172 Table B-13. Q13. Rank these potential sources of knowledge (for a recommendation) in the order that you would use it Recommendation Source Average Ranking of Source Personal experience 1.60 Unbiased organization 2.40 Family member recommendation 3.86 Large group rating 3.37 Friend’s recommendation 3.77

Table B-14. Q14. This question concerns referred trust. When choosing a cloud, I would use a A friend recommendation from Recommendation Source Average %

69% A friend 68.57% A friend’s friend (here and 25.71% 6% A friend’s friend’s friend below, you do not know

26% the individual) A friend’s friend’s friend 5.71% A friend’s friend’s friend’s 0.0% A friend’s friend friend A friend’s friends’ friend’s 0.0% friend’s friend

Table B-15. Q15. I trust more sensitive data to Somewhat Agree Strongly Agree some clouds, but not others 20% 40% Opinion Average % Strongly Agree 20.00% 6% Strongly Disagree Somewhat Agree 40.00% 11% 23% Neither Agree nor Disagree 22.86% Somewhat Disagree Somewhat Disagree 11.43%

Neither Agree nor Disagree Strongly Disagree 5.71%

0 Table B-16. Q16. I have read service level agreements SLAs I have read Average percentage 49% 0 48.57%

11% 1 11.43% 1 23% 2 0.0% 17% More than 2 17.14% What’s a service What’s a service level agree- 22.86% More than 2 level agreement? ment

173 25 cents nothing Table B-17. Q17. I would pay as a one-time 3% charge to use a personalized cloud 31% policy, if all clouds followed it I would pay Average % 1 dollar 34% nothing 31.43%

31% 10 cents 0.0% 50 cents 0.0% 1 dollar 34.29% greater than 1 dollar greater than 1 dollar 31.43%

Q18. Assuming the cloud followed your policies, what kind of cloud policies would you like to see available for use? - See Appendix C for free form responses.

Table B-18. Q19. Your age range (no subject 18-24 under 18 should complete this survey, thank you) Age range Average % 54% 18-24 54.29% 25-34 40.00% 6% 45-54 35-44 0.0% 40% 45-54 5.71% 55-64 0.0% 65-74 0.0% 25-34 75-84 0.0% 85 or older 0.0%

Pro Table B-19. Q20. Technical competence Saavy Technical competence Average percentage 20% 49% Pro 20.00% 3% Newbie Saavy 48.57% Average 28.57% 29% Newbie 2.86% Not 0.0% Average

174 APPENDIX C FREE FORM SURVEY RESULTS

Table C-1. Q18. Assuming the cloud followed your policies, what kind of cloud policies would you like to see available for use? Policy ideas Permission for third parties to access data; data location; data retention period Availability and Security Explicit description of how my data is being used by anyone besides myself; honest about third-party use; prompt notifications in the event of security compromise Privacy 3rd part consent to access, notice of data movement, yearly or 1 time fee that does not increase, clear data breach plan, 2fa enabled, knowledge of data location and possible info related to who else is sharing the ”cloud” my data is on (especially if it is sensitive) The largest is mostly security and it should be the same level No giving out data/information to 3rd parties. My data be stored in multiple servers in case one stops working. Guarantees of data privacy and security Say who has access to my data If my data is moved out of my city. Restricting sharing of my data Security

175 APPENDIX D DAPP CONFIGURATION AND STARTUP INSTRUCTIONS D.1 Installing the Software

1. start with standard linux 2. create new project directory 3. sudo apt-get install node.js 4. sudo install npm (npm is the node package manager) 5. npm init, then press enter several times 6. npm install –save ganache-cli mocha [email protected] fs-extra [email protected], new 38 7. npm install –save truffle-hdwallet-provider 8. npm install –save [email protected] react react-dom 9. npm install –save react-dnd 10. npm install –save [email protected] (upgrade to fix at 16.2) D.2 Starting up DApp

1. run ganache-cli root@s50-63-161-252: # /home/kirkman/intercloud/node modules/.bin/ganache-cli -h stephenkirkman.org & (startup blockchain) 2. Script → /home/kirkman/intercloud2/node modules/.bin/ganache-cli -e 100 -h stephenkirkman.org -d –accounts 20 –deterministic –mnemonic=”glimpse purpose combine govern wear viable hybrid credit organ volume cable energy” & ganache.log & 3. or use script ./start ganache.sh (created this to ease startup) 4. setup metamask using custom RPC and http://www.stephenkirkman.org 5. save mnemonics to clipboard 6. copy mnemonics to deploy.js file 7. cd Intercloud2/ethereum 8. node compile.js (if changes made) 9. node deploy.js

176 10. copy deploy addresses to main.js and whitelist.js and whitelisttag.js (resave) 11. Development: Edit server.js to show !== production; npm run dev & (startup DApp. Will run the node.js environment, dev is a short alias for ”node server.js”) 12. Production: Edit server.js to show == production; npm run build; npm run dev & D.3 Key DApp Processes

kirkman 28348 28333 5 13:28 pts/1 00:00:00 npm kirkman 28364 28348 0 13:28 pts/1 00:00:00 sh -c node server.js kirkman 28365 28364 71 13:28 pts/1 00:00:06 node server.js root 21242 1 3 Jan31 ? 12:38:18 node /home/kirkman/intercloud/node modules/.bin/ganache D.4 Adding a Contract to the Local Test Environment

1. Test contract using the Ethereum Remix online tool 2. Place new contract under contracts folder 3. Modify compile.js script with contract path. Compile.js compiles all smart contracts and creates contract API. 4. Modify deploy.js script with contract info 5. Create new react pages using previous design pattern D.5 Adding a Contract to the Ethereum Mainnet

1. Test contract using the Ethereum Remix online tool 2. Change network within remix 3. Ensure Metamask is set up 4. Deploy via Remix 5. Make a note of the deploy address D.6 Metamask Usage Guide

Metamask is the browser interface that allows you to send transactions to the blockchain. In our case we use test accounts Instructions: 1. Install the Metamask , click on link Metamask

177 2. Do not set the password yet, close all browser windows that Metamask opened. Click on the fox head icon at the upper right of the browser 3. Click import seed phrase ... see seed phrase below: 4. unique naive syrup neutral call ketchup prefer unlock alarm possible brick bag 5. Supply the seed phrase and supply a test password of your own choice. Click import. ***Note*** Copy and pasting might close your Metamask window and force you to start over 6. Accept all terms of service, scroll to end, Accept 7. Select Custom RPC in the network dropdown 8. In the field that says ”New RPC URI”, type: http://www.stephenkirkman.org:8545 9. Click Save, to save custom RPC 10. Metamask will connect to our research blockchain and provide test digital currency (approx 100 ether per account) 11. Close the Custom RPC window, using the ”X” in upper right. 12. Metamask will connect to our research blockchain and provide test accounts with digital currency (ether) 13. You can access up to 10 free accounts by changing to Account 1 - 10 and setting password 14. You are now ready to interact with the InterCloud DAPP. Metamask window will appear before each transaction for you to confirm

178 APPENDIX E VIRTUAL MACHINE TAGGING STEPS E.1 Tagging a Virtual Machine

1. Shut down virtual machine 2. Select file → Export Appliance 3. Double Click on Description Field 4. Enter tag (no spaces) “SteveData123”; this will be used later in our smart contract policy 5. Check box for writing manifest file. This will include a file containing a hash of each ovf file. 6. Select OVF version 1.0. 7. Click export (this will start a several minute process to export to ova format, in our case InterCloud2.ova. Figure E-1 shows the tag created in virtualbox.

Figure E-1. Using Oracle Virtualbox to export VM to OVA

179 E.2 Convert OVA to OVF for Package Signing

To convert OVA to see all OVF files use VMware OVF Tool. The ovftool in this case behaves like unzip. The command is: ovftool sourcename.ova targetname.ovf. OVA to OVF Format

C:\Users\Steve\Documents\OVF InterCloudTrust>”C:\Program Files\VMware\VMware OVF Tool\

,→ ovftool” InterCloudTrust.ova InterCloudTrust.ovf

Opening OVA source: InterCloudTrust.ova

Opening OVF target: InterCloudTrust.ovf

Writing OVF package: InterCloudTrust.ovf

Transfer Completed

The manifest validates

Warning:

− No manifest entry found for: ’InterCloudTrust−disk001.vmdk’.

Completed successfully

C:\Users\Steve\Documents\OVF InterCloudTrust>

180 APPENDIX F PUBLICATIONS SUPPORTING DISSERTATION F.1 Publications Since Proposal

• Stephen Kirkman and Richard Newman, ”Control Your CLoud OPerationS (CYCLOPS): A Consumer Policy Cloud Trust DApp”, Submitted to IEEE Transactions on Services Computing: Special Issue on Blockchain-Based Services Computing, March 31, 2019

• Stephen Kirkman and Richard Newman, ”A Trust Model for Cloud: Results from a Survey”, Accepted FTC 2019 - Future Technologies Conference 2019, 14-15 November 2019, San Francisco, CA. Acceptance rate: 39%

• Stephen Kirkman and Richard Newman, ”InterCloud: A Data Movement Policy DApp for Managing Trust in the Cloud.” Presented at the 5th Annual Conf. on Computational Science and Computational Intelligence (CSCI’18), Dec 13-15, 2018, Las Vegas, NV. Acceptance rate: 19%

• Stephen Kirkman and Richard Newman. ”A Cloud Data Movement Policy Architecture Based on Smart Contracts and the Ethereum Blockchain.” First IEEE Workshop on Blockchain Technologies and Applications (BTA) 2018 [73].

• Stephen Kirkman. A Data Movement Policy Framework for Improving Trust in the Cloud Using Smart Contracts and Blockchains. Doctoral Symposium at IC2E 2018, Orlando, FL [72]. F.2 Publications Before Proposal

• Stephen Kirkman and Richard Newman. ”Using Smart Contracts and Blockchains to Support Consumer Trust Across Distributed Clouds.” WorldComp, GCC’17 - The 13th Int’l Conf on Grid, Cloud, and Cluster Computing [76].

• Stephen Kirkman and Richard Newman. ”Bridging the Cloud Trust Gap: Using ORCON Policy to Manage Consumer Trust Between Different Clouds.” First IEEE International Conference on Edge Computing (EDGE 2017), June 25 - June 30, 2017 [75].

181 REFERENCES [1] “AWS Shared Reponsibility Model.” 2018. https://aws.amazon.com/compliance/shared- responsibility-model/. [2] “Arm TrustZone.” Accessed: 2017, [Online]. http://www.openvirtualization.org/open- source-arm-trustzone.html. [3] “Cloud Security Alliance CTP Data Model and API, rev. 2.13.” Accessed: 2017, [Online]. https://downloads.cloudsecurityalliance.org/assets/research/cloudtrust-protocol/CTP- Data-Model-And-API.. [4] “Cloud Security Alliance Star Registry.” Accessed: 2017, [Online]. https: //cloudsecurityalliance.org/star/. [5] “Cryptonator.” Accessed: 2017, [Online]. https://www.cryptonator.com/rates/ETH- USD. [6] “Ether Chart.” Accessed: 2017, [Online]. http://www.ethdocs.org/en/latest/ether.html. [7] “Ethereum.” Accessed: 2017, [Online]. https://www.ethereum.org/. [8] “Ethereum Block Architecture.” Accessed: 2017, [Online]. https://ethereum. stackexchange.com/questions/268/ethereum-block-architecture. [9] “Ethereum Homestead Documentation Release 0.1.” Accessed: 2017, [Online]. https: //media.readthedocs.org/pdf/ethereum-homestead/latest/ethereum-homestead.pdf. [10] “Intel Developer Zone.” Accessed: 2017, [Online]. https://software.intel.com/en-us/sgx. [11] “Intel TXT.” Accessed: 2017, [Online]. https://www.intel.com/content/www/us/en/ architecture-and-technology/trusted-execution-technology/malware-reduction-general- technology.html. [12] “Litecoin.” Accessed: 2017, [Online]. https://litecoin.org/. [13] “MicroFocus ArcSight.” Accessed: 2017, [Online]. https://www.microfocus.com/en- us/products/siem-security-information-event-management/overview. [14] “Open Virtualization Format Whitepaper.” Accessed: 2017, [Online]. http://www.dmtf. org/sites/default/files/standards/documents/DSP2017 2.0.0.pdf. [15] “Splunk.” Accessed: 2017, [Online]. https://www.splunk.com/. [16] “Swarm Documentation.” Accessed: 2017, [Online]. http://swarm-guide.readthedocs.io/ en/latest/introduction.html. [17] “Trusted Computing Group Trusted Platform Module.” Accessed: 2017, [Online]. http://www.trustedcomputinggroup.org/work-groups/trusted-platform-module/.

182 [18] “6. Power and Sample Size.” Accessed: 2018, [Online]. http://www.3rs-reduction.co.uk/ html/main menu.html. [19] “Assurance and Trust in the Cloud.” Accessed: 2018, [Online]. https://www.raconteur. net/technology/assurance-and-trust-in-the-cloud. [20] “Designing an Experiment, Power Analysis.” Accessed: 2018, [Online]. http://www. statsoft.com/Textbook/Power-Analysis. [21] “Security in the Cloud, Top Nine Issues in Building User Trust.” Accessed: 2018, [Online]. http://www.computerweekly.com/feature/Security-in-the-cloud-Top-nine- issues-in-building-users-trust/. [22] “Suspending Cambridge Analytica and SCL Group from Facebook.” Accessed: 2018, [Online]. https://newsroom.fb.com/news/2018/03/suspending-cambridge-analytica/. [23] “Proof of Work.” Accessed: 2019, [Online]. https://en.bitcoin.it/wiki/Proof of work. [24] “Stack Exchange.” Accessed: 2019, [Online]. https://ethereum.stackexchange.com/ questions/50587/how-do-we-achieve-immutability-within-a-contract-on-the-blockchain. [25] “VMWare Ovftool.” Accessed: 2019, [Online]. https://www.vmware.com/support/ developer/ovf/. [26] “Vsphere Tagging.” Accessed: 2019, [Online]. http://www.doublecloud.org/2011/06/ tagging-an-invisible-feature-in-vsphere. [27] “Busman, Jesse.” Oct 19, 2017. https://ethereum.stackexchange.com/questions/28813/ how-to-write-an-optimized-gas-cost-smart-contract. [28] Abbadi, Imad M and Alawneh, Muntaha. “A framework for establishing trust in the cloud.” Computers & Electrical Engineering 38 (2012).5: 1073–1087. [29] Ali, Muneeb, Shea, Ryan, Nelson, Jude, and Freedman, Michael J. “Blockstack: A New Decentralized Internet.” (2017). [30] Alsmadi, Duha and Prybutok, Victor. “Sharing and storage behavior via cloud comput- ing: Security and privacy in research and practice.” Computers in Human Behavior 85 (2018): 218–226. [31] Anderson, Eric and Li, Jun. “Cooperative policy control for peer-to-peer data distribu- tion.” Tech. rep., 2010. [32] Antonopoulos, Andreas M and Wood, Gavin. Mastering ethereum: building smart contracts and dapps. O’Reilly Media, 2018. [33] Arthur, Will and Challener, David. A Practical Guide to TPM 2.0: Using the Trusted Platform Module in the New Age of Security. Apress, 2015.

183 [34] Azaria, Asaph, Ekblaw, Ariel, Vieira, Thiago, and Lippman, Andrew. “MedRec: Using Blockchain for Medical Data Access and Permission Management.” Open and Big Data (OBD), International Conference on. IEEE, 2016, 25–30. https://www.pubpub.org/pub/ medrec. [35] Basak, Abhishek, Bhunia, Swarup, Tkacik, Thomas, and Ray, Sandip. “Security Assurance for System-on-Chip Designs With Untrusted IPs.” IEEE Transactions on Information Forensics and Security 12 (2017).7: 1515–1528. [36] Bates, Adam, Mood, Ben, Valafar, Masoud, and Butler, Kevin. “Towards secure provenance-based access control in cloud environments.” Proceedings of the third ACM conference on Data and application security and privacy. ACM, 2013, 277–284. [37] Bishop, M. “Computer Security: Art and Science.” 2003. [38] Burda, Daniel and Teuteberg, Frank. “The role of trust and risk perceptions in cloud archivingResults from an empirical study.” The Journal of High Technology Management Research 25 (2014).2: 172–187. [39] Buterin, Vitalik and Griffith, Virgil. “Casper the Friendly Finality Gadget.” arXiv preprint arXiv:1710.09437 (2017). [40] Casella, George and Berger, Roger L. Statistical inference, vol. 2. Duxbury Pacific Grove, CA, 2002. [41] Castelfranchi, Christiano and Falcone, Rino. Trust theory: A socio-cognitive and computational model, vol. 18. John Wiley & Sons, 2010. [42] Chow, Randy and Chow, Yuen-Chien. Distributed operating systems and algorithms. Addison-Wesley Longman Publishing Co., Inc., 1997. [43] Cohen, Jacob. “Statistical power analysis for the behavioral sciences 2nd edn.” 1988. [44] Corp, Intel. “Intel Software Guard Extensions (Intel SGX).” 2015. https://software.intel. com/sites/default/files/332680-002.pdf. [45] Costan, Victor and Devadas, Srinivas. “Intel SGX Explained.” IACR Cryptology ePrint Archive 2016 (2016): 86. [46] Costan, Victor, Lebedev, Ilia A, and Devadas, Srinivas. “Sanctum: Minimal Hardware Extensions for Strong Software Isolation.” USENIX Security Symposium. 2016, 857–874. [47] Cowcill, Lucy. “Business trust in data security in the cloud at an all-time low.” http: //www.globalservices.bt.com/uk/en/ news/business trust in data security in cloud at all time low (2014). [48] Davis, Jessica. “Salesforce Outage: Can Customers Trust The Cloud?” 2016. http: //www.informationweek.com/cloud/platform-as-a-service/salesforce-outage-can- customers-trust-the-cloud/d/d-id/1325499.

184 [49] Djackov, Maksim. “TPM Picture.” Accessed: 2018, [Online]. https://www.slideshare. net/MaksimDjackov/. [50] Dolan, Richard. “Security remains a major obstacle to cloud adoption, study finds.” 2015. https://www.datapipe.com/blog/2015/03/25/security-remains-a-major-obstacle- to-cloud-adoption-study-finds/. [51] Ellis, Paul D. The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press, 2010. [52] ElSalamouny, Ehab, Sassone, Vladimiro, and Nielsen, Mogens. “HMM-based trust model.” International Workshop on Formal Aspects in Security and Trust. Springer, 2009, 21–35. [53] Enderle, Rob. “Why you shouldn’t trust Cloud Service Providers.” 2016. //http: www.cio.com/article/2994313/cloud-security/why-you-shouldn-t-trust-cloud-service- providers.html. [54] Fatema, Kaniz, Healy, Philip D, Emeakaroha, Vincent C, Morrison, John P, and Lynn, Theo. “A User Data Location Control Model for Cloud Services.” CLOSER. 2014, 476–488. [55] Fischer, Michael J, Lynch, Nancy A, and Paterson, Michael S. “Impossibility of distributed consensus with one faulty process.” Tech. rep., MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE, 1982. [56] Floyd, David. “Blockchain Could Make You - Not Equifax - the Owner of Your Data.” 2018. https://www.investopedia.com/news/blockchain-could-make-you-owner-data- privacy-selling-purchase-history/. [57] Foran, Joseph. “Ten Questions to Ask when Storing Data in the Cloud.” 2016. http://searchcloudcomputing.techtarget.com/tip/Ten-questions-to-ask-when-storing- data-in-the-cloud. [58] Gentry, Craig. A fully homomorphic encryption scheme. Ph.D. thesis, Stanford University, 2009. [59] Ghahramani, Zoubin. “An introduction to hidden Markov models and Bayesian net- works.” International journal of pattern recognition and artificial intelligence 15 (2001).01: 9–42. [60] Gillett, Frank and Guidry, Triona. “Are Consumers Better Off Putting Everything in the Cloud?” 2014. https://www.wsj.com/articles/are-consumers-better-off-putting- everything-in-the-cloud-1399644099. [61] Greenspan, G. “MultiChain Private Blockchain.” White paper, July (2015).

185 [62] Hardy, Quentin. “Where Does Cloud Storage Really Reside? And Is It Secure?” 2017. https://www.nytimes.com/2017/01/23/insider/where-does-cloud-storage-really-reside- and-is-it-secure.html. [63] Harris, Shon. CISSP all-in-one exam guide. McGraw-Hill, Inc., 2013. [64] Harzog, Bernd. “Can we trust the public cloud vendors?” http://www.networkworld. com/article/3176906/cloud-computing/can-we-trust-the-public-cloud-vendors.html (2017). [65] Horvath, Albert S and Agrawal, Rajeev. “Trust in cloud computing.” SoutheastCon 2015. IEEE, 2015, 1–8. [66] Hsu, Pei-Fang, Ray, Soumya, and Li-Hsieh, Yu-Yu. “Examining cloud computing adoption intention, pricing mechanism, and deployment model.” International Journal of Information Management 34 (2014).4: 474–488. [67] Hunt, Ann OMDS. “A Researchers Guide to Power Analysis.” Accessed: 2018, [Online]. http://rgs.usu.edu/irb/wp-content/uploads/sites/12/2015/08/A Researchers Guide to Power Analysis USU.pdf. [68] Jaeger, Bernd, Kraft, Reiner, Luhn, Sebastian, Selzer, Annika, and Waldmann, Ulrich. “The Measurement of Data Locations in the Cloud.” Availability, Reliability and Security (ARES), 2015 10th International Conference on. IEEE, 2015, 670–675. [69] ———. “Access Control and Data Separation Metrics in Cloud Infrastructures.” Availability, Reliability and Security (ARES), 2016 11th International Conference on. IEEE, 2016, 205–210. [70] Josang, Audun and Ismail, Roslan. “The beta reputation system.” Proceedings of the 15th bled electronic commerce conference. vol. 5. 2002, 2502–2511. [71] Karwoski, John. “Introduction to Intel Software Guard Extensions.” https://software. intel.com/en-us/ videos/intel-software-guard-extensions-intel-sgx-webinar (2017). [72] Kirkman, Stephen. “A Data Movement Policy Framework for Improving Trust in the Cloud Using Smart Contracts and Blockchains.” Cloud Engineering (IC2E), 2018 IEEE International Conference on. IEEE, 2018, 270–273. [73] Kirkman, Stephen and Newman, Richard. “A Cloud Data Movement Policy Architecture Based on Smart Contracts and the Ethereum Blockchain.” Cloud Engineering (IC2E), 2018 IEEE International Conference on. IEEE, 2018, 371–377. [74] ———. “InterCloud: A Data Movement Policy DApp for Managing Trust in the Cloud.” 5th Annual Conf. on Computational Science and Computational Intelligence (CSCI’18). CSCI, 2018.

186 [75] Kirkman, Stephen S and Newman, Richard. “Bridging the cloud trust gap: Using orcon policy to manage consumer trust between different clouds.” Edge Computing (EDGE), 2017 IEEE International Conference on. IEEE, 2017, 82–89. [76] ———. “Using smart contracts and blockchains to support consumer trust across distributed clouds.” The 13th Intl Conf on Grid, Cloud, and Cluster Computing, WorldComp, GCC17, Las Vegas, NV. 2017. [77] Ko, Ryan KL, Jagadpramana, Peter, Mowbray, Miranda, Pearson, Siani, Kirchberg, Markus, Liang, Qianhui, and Lee, Bu Sung. “TrustCloud: A framework for accountability and trust in cloud computing.” Services (SERVICES), 2011 IEEE World Congress on. IEEE, 2011, 584–588. [78] Ko, Ryan KL, Kirchberg, Markus, and Lee, Bu Sung. “Special issue on trust and security in cloud computing.” Security and Communication Networks 7 (2014).11: 2183–2184. [79] Ko, Ryan KL, Russello, Giovanni, Nelson, Richard, Pang, Shaoning, Cheang, Aloysius, Dobbie, Gill, Sarrafzadeh, Abdolhossein, Chaisiri, Sivadon, Asghar, Muhammad Rizwan, and Holmes, Geoffrey. “Stratus: Towards returning data control to cloud users.” International Conference on Algorithms and Architectures for Parallel Processing. Springer, 2015, 57–70. https://stratus.org.nz/. [80] Larimer, Daniel, Kasper, Lance, and Schuh, Fabian. “BitShares 2.0: Financial Smart Contract Platform.” 2015. [81] Maas, Martin, Love, Eric, Stefanov, Emil, Tiwari, Mohit, Shi, Elaine, Asanovic, Krste, Kubiatowicz, John, and Song, Dawn. “Phantom: Practical oblivious computation in a secure processor.” Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 2013, 311–324. [82] Marsh, Stephen Paul. “Formalising trust as a computational concept.” (1994). [83] Melaye, Dimitri and Demazeau, Yves. “Bayesian dynamic trust model.” International Central and Eastern European Conference on Multi-Agent Systems. Springer, 2005, 480–489. [84] Nakamoto, Satoshi. “Bitcoin: A peer-to-peer electronic cash system.” 2008. [85] News, Business Cloud. “Only 13% Trust Public Cloud with Sensitive Data Intel survey.” 2016. http://www.businesscloudnews.com/2016/04/14/only-13-trust-public-cloud-with- sensitive-data-intel-survey/. [86] Nielsen, Mogens, Krukow, Karl, and Sassone, Vladimiro. “A bayesian model for event- based trust.” Electronic Notes in Theoretical Computer Science 172 (2007): 499–521. [87] Noor, Talal H, Sheng, Quan Z, Yao, Lina, Dustdar, Schahram, and Ngu, Anne HH. “CloudArmor: Supporting reputation-based trust management for cloud services.” IEEE transactions on parallel and distributed systems 27 (2016).2: 367–380.

187 [88] Oliveira, Daniela, Murthy, Dhiraj, Johnson, Henric, Wu, S Felix, Nia, Roozbeh, and Rowe, Jeff. “A socially-aware operating system for trustworthy computing.” Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on. IEEE, 2011, 380–386. [89] Pappas, Vasilis, Kemerlis, Vasileios P, Zavou, Angeliki, Polychronakis, Michalis, and Keromytis, Angelos D. “CloudFence: Data flow tracking as a cloud service.” Interna- tional Workshop on Recent Advances in Intrusion Detection. Springer, 2013, 411–431. [90] Pearl, Judea. “Bayesian networks.” (2011). [91] Peck, Roxy, Olsen, Chris, and DeVore, Jay L. Introduction to statistics and data analysis. Duxbury Pr, 2001. [92] Peck, Roxy, Olsen, Chris, and Devore, Jay L. Introduction to statistics and data analysis. Cengage Learning, 2015. [93] Ren, Jiangchun, Liu, Ling, Zhang, Da, Zhang, Qi, and Ba, Haihe. “Tenants attested Trusted Cloud Service.” Cloud Computing (CLOUD), 2016 IEEE 9th International Conference on. IEEE, 2016, 600–607. [94] Robinson, Brian. “Computing with encrypted data.” 2017. https://gcn.com/articles/ 2017/05/25/homomorphic-encryption.aspx/. [95] Ruan, Anbang and Martin, Andrew. “RepCloud: Attesting to Cloud Service Dependency.” IEEE Transactions on Services Computing (Volume:PP , Issue: 99 ) (2016). [96] Ruan, Anbang, Wei, Ming, Martin, Andrew, Blundell, David, and Wallom, David. “Breaking Down the Monarchy: Achieving Trustworthy and Open Cloud Ecosystem Governance with Separation-of-Powers.” Cloud Computing (CLOUD), 2016 IEEE 9th International Conference on. IEEE, 2016, 505–512. [97] S., Naftaly. “Pin - A Dynamic Binary Instrumentation Tool.” 2012. https://software. intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool. [98] Sadeghi, Ahmad-Reza and Stuble,¨ Christian. “Property-based attestation for computing platforms: caring about properties, not mechanisms.” Proceedings of the 2004 workshop on New security paradigms. ACM, 2004, 67–77. [99] Samani, Raj. “Cloud Ubiquity It’s Coming, But Not Yet!” 2017. https:// securingtomorrow.mcafee.com/business/cloud-security/cloud-ubiquity-coming-not-yet/. [100] Santos, Nuno, Gummadi, Krishna P, and Rodrigues, Rodrigo. “Towards Trusted Cloud Computing.” HotCloud 9 (2009): 3–3. [101] Santos, Nuno, Rodrigues, Rodrigo, Gummadi, Krishna P, and Saroiu, Stefan. “Policy- sealed data: A new abstraction for building trusted cloud services.” 21st USENIX Security Symposium (USENIX Security 12). 2012, 175–188.

188 [102] Sapienza, Alessandro and Falcone, Rino. “A Bayesian Computational Model for Trust on Information Sources.” WOA. 2016, 50–55. [103] Schiffman, Joshua, Sun, Yuqiong, Vijayakumar, Hayawardh, and Jaeger, Trent. “Cloud verifier: Verifiable auditing service for iaas clouds.” 2013 IEEE Ninth World Congress on Services. IEEE, 2013, 239–246. [104] Schuster, Felix, Costa, Manuel, Fournet, C´edric, Gkantsidis, Christos, Peinado, Marcus, Mainar-Ruiz, Gloria, and Russinovich, Mark. “VC3: trustworthy data analytics in the cloud using SGX.” 2015 IEEE Symposium on Security and Privacy. IEEE, 2015, 38–54. [105] Seol, Jinho, Jin, Seongwook, Lee, Daewoo, Huh, Jaehyuk, and Maeng, Seungryoul. “A trusted iaas environment with module.” IEEE Transactions on Services Computing 9 (2016).3: 343–356. [106] Singh, Sarangthem Ibotombi and Sinha, Smriti Kumar. “A new trust model using Hidden Markov Model based mixture of experts.” Computer Information Systems and Industrial Management Applications (CISIM), 2010 International Conference on. IEEE, 2010, 502–507. [107] ———. “A Trust Model based on Markov Model Driven Gaussian Process Prediction.” International Journal of Computer Applications 146 (2016).14. [108] Spence, Ewan. “The Dangers Of Trusting Cloud Computing Over Personal Storage.” 2015. https://www.forbes.com/sites/ewanspence/2015/06/01/the-dangers-of-trusting- cloud-computing-over-personal-storage/#255a23a64cd9. [109] Suen, Chun-Hui, Kirchberg, Markus, and Lee, Bu Sung. “Efficient migration of virtual machines between public and private cloud.” Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on. IEEE, 2011, 549–553. [110] Suen, Chun Hui, Ko, Ryan KL, Tan, Yu Shyang, Jagadpramana, Peter, and Lee, Bu Sung. “S2logger: End-to-end data tracking mechanism for cloud data provenance.” Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on. IEEE, 2013, 594–602. [111] Suh, G Edward, O’Donnell, Charles W, and Devadas, Srinivas. “AEGIS: A single-chip secure processor.” Information Security Technical Report 10 (2005).2: 63–73. [112] Tan, Yu Shyang, Ko, Ryan KL, Jagadpramana, Peter, Suen, Chun Hui, Kirchberg, Markus, Lim, Teck Hooi, Lee, Bu Sung, Singla, Anurag, Mermoud, Ken, Keller, Doron, et al. “Tracking of data leaving the cloud.” Trust, Security and Privacy in Computing and Communications (TrustCom), 2012 IEEE 11th International Conference on. IEEE, 2012, 137–144. [113] Thirunarayan, Krishnaprasad, Anantharam, Pramod, Henson, Cory, and Sheth, Amit. “Comparative trust management with applications: Bayesian approaches emphasis.” Future Generation Computer Systems 31 (2014): 182–199.

189 [114] ur Rehman, Zia, Hussain, Omar Khadeer, Chang, Elizabeth, and Dillon, Tharam. “Decision-making framework for user-based inter-cloud service migration.” Electronic Commerce Research and Applications 14 (2015).6: 523–531. [115] Wahab, Omar Abdul, Bentahar, Jamal, Otrok, Hadi, and Mourad, Azzam. “Towards trustworthy multi-cloud services communities: A trust-based hedonic coalitional game.” IEEE Transactions on Services Computing (2016). [116] Wall, Matthew. “Can we trust cloud providers to keep our data safe.” 2016. http: //www.bbc.com/news/business-36151754. [117] Wang, Yao and Vassileva, Julita. “Bayesian network-based trust model.” Web Intelli- gence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on. IEEE, 2003, 372–378. [118] Wang, Yong, Cahill, Vinny, Gray, Elizabeth, Harris, Colin, and Liao, Lejian. “Bayesian network based trust management.” International Conference on Autonomic and Trusted Computing. Springer, 2006, 246–257. [119] Wilkinson, Shawn, Boshevski, Tome, Brandoff, Josh, and Buterin, Vitalik. “Storj: A Peer-to-Peer Cloud Storage Network.” (2014). Urlhttps://storj.io/. [120] Will, Mark A, Garae, Jeffery, Tan, Yu Shyang, Scoon, Craig, Ko, Ryan KL, Will, Mark A, Ko, Ryan KL, Witten, Ian H, Will, Mark A, Ko, Ryan KL, et al. “Returning Control of Data to Users with a Personal Information Crunch-A Position Paper.” (2013). [121] Wood, Gavin et al. “Ethereum: A secure decentralised generalised transaction ledger.” Ethereum project yellow paper 151 (2014): 1–32. [122] Wright, Aaron and De Filippi, Primavera. “Decentralized blockchain technology and the rise of lex cryptographia.” Available at SSRN 2580664 (2015). [123] Yeluri, Raghuram and Castro-Leon, Enrique. Building the Infrastructure for Cloud Security: A Solutions View. Apress, 2014. [124] Zamfir, Vlad. “Introducing Casper the Friendly Ghost.” Ethereum Blog URL: https://blog. ethereum. org/2015/08/01/introducing-casper-friendly-ghost (2015). [125] Zhang, Tianwei and Lee, Ruby B. “Cloudmonatt: An architecture for security health monitoring and attestation of virtual machines in cloud computing.” 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). IEEE, 2015, 362–374.

190 BIOGRAPHICAL SKETCH Steve grew up in Portland, Oregon, majored in computer science at Portland State University, and graduated with a Bachelor of Science in computer science. Upon graduation, he spent 6 and a half years in the Air Force. Following basic training, he was assigned to the Pentagon as a Unix system administrator. Two and a half years later, he was accepted to attend Officer Training School where he received his commission. Steve had 9 months of training to become a ICBM missile launch officer. He spent the next 4 years as a Combat Crew Member at FE Warren AFB, Wyoming. While on crew duty, he held various positions including, Assistant Flight Commander and instructor. He also got his Master of Business Administration. After completing the tour of duty, he separated from the military. Steve spent the next decade working for defense programs in both the Air Force and the Intelligence Community as a software developer and a systems security engineer. During this time, he obtained a Master of Science in computer science. In 2013, Steve entered the PhD program in the Computer and Information Science and Engineering Department at the University of Florida. In his second year, Steve was the recipient of a three year National Defense Science and Engineering Graduate Fellowship. In his last three years, Steve was a teaching assistant for a variety of computer security oriented classes. Steve has several hobbies and interests. He is an amateur astronomer, plays the piano, and tennis. He likes to ride his motorcycle, enjoys reading, weightlifting, and Tae Kwon Do with his wife and son, Eva and William. In 2019, Steve received his PhD in computer engineering.

191