Network Analysis of the Share Ownership Structure on the Swedish Stock Market

Network analysis of the share ownership structure on the Swedish stock market

Ludvig Bohlin

Master’s Thesis in Engineering Physics, Department of Physics, Umeå University, 2012

Department of Physics Linnaeus väg 20 901 87 Umeå Sweden www.physics.umu.se

Network analysis of the share ownership structure on the Swedish stock market

Ludvig Bohlin

Integrated Science Lab Department of Physics Umeå University June 21, 2012 Network analysis of the share ownership structure on the Swedish stock market Master’s thesis, Master of Science in Engineering Physics, Umeå University. Ludvig Bohlin, [email protected].

Supervisor: Martin Rosvall, Department of Physics, Umeå University. Examiner: Ludvig Lizana, Department of Physics, Umeå University.

Presented: Umeå University, June 14, 2012. Approved for print: June 19, 2012.

ii Abstract

The stock market is an example of a complex system, i.e. it consists of a number of traders, interacting in such a way that their collective behaviour, the behaviour of the market, is not a simple combination of their individual behaviour. One of the most important tasks in modern finance is finding efficient ways of summarizing and visualizing the stock market data to obtain useful information about the behavior of the market. In this thesis we investigate the possibility of finding a way to summarize and cluster share ownership data from the Swedish stock market. This is done by using a network approach to analyze the structure of the share ownership in order to find significant patterns in the data. The analysis of the network is performed with the community detection algorithm InfoMap, which turns the problem of finding clusters into the problem of optimally compressing the flow of information on the structure of the network. The results of the analysis indicate that it is possible to find significant patterns in the ownership data when looking at the holdings of individuals using a binary approach. By using the clusters with the largest information flow, a majority of the analyzed individuals are categorized into clusters that accommodates for different properties regarding the ownership of the included individuals. The clustering results are visualized using alluvial diagrams which also are used to display changes that occur in the ownership structure between two dates.

Sammanfattning

Aktiemarknaden är ett exempel på ett komplext system, d.v.s. ett system som består av ett antal aktörer som alla samverkar på ett sådant sätt att deras kollektiva beteende, marknadens beteende, inte bara är en kombination av deras individuella beteende. En av de viktigaste uppgifterna på dagens finansmarknad är att hitta effektiva sätt att sammanfatta och visualisera data från aktiemarknaden som skulle kunna ge användbar information om marknadens beteende. I denna avhandling undersöker vi om det går att hitta ett sätt att sammanfatta och klustra aktieinnehavsdata från den svenska aktiemarknaden. Detta utförs med hjälp av en nätverksbaserad metod som används för att analysera aktieinnehavstrukturen och hitta viktiga mönster i datamaterialet. Analysen av nätverket sker med klustringsalgo- ritmen InfoMap, som gör om problemet att hitta kluster i ett nätverk till problemet att hitta en optimal komprimering av informationsflödet på nätverkets struktur. Resultatet av analysen visar pä möjligheten att hitta mönster i aktieinnehavsdatat när man tittar på individers innehav genom att använda ett binärt tillvägagångssätt. Genom att använda de kluster som har störst informationsflöde så kan en majoritet av de analyserade individerna kategoriseras i kluster med olika egenskaper när det gäller ägandet hos de inkluderade individerna. Klustringsresultatet kan visualiseras med hjälp av alluvialdiagram, och dessa kan också användas för att visa på förändringar som sker i ägarstrukturen mellan två datum.

iii iv Preface

This is my Master’s thesis for the degree of Master of Science in Engineering Physics at Umeå University. The thesis has been written at the Integrated Science Lab (IceLab) at Umeå University during the spring of 2012. I would like to thank my supervisor Martin Rosvall for providing valuable input and guidance throughout the project and Krister Modin at Euroclear Sweden AB who made this work possible. Gratitude is also directed towards the people at IceLab for welcoming me as a member of the team, and to Patrik Törmänen for helpful comments and numerous Board Hockey games that helped me clear my mind. I also would like to thank my girlfriend Emelie for believing in me and always supporting me.

Ludvig Bohlin, Umeå, June 21, 2012.

v vi Contents

1 Introduction 1 1.1 Background ...... 1 1.2 Problem formulation ...... 1 1.3 Outline of thesis ...... 2

2 Networks 3 2.1 Preliminaries ...... 3 2.2 Network structure ...... 4 2.3 Complex networks ...... 5 2.3.1 Real-world networks ...... 6 2.4 Bipartite networks ...... 6 2.5 Community detection in networks ...... 7 2.5.1 Hierarchical clustering ...... 8 2.6 Similarity measures ...... 9 2.6.1 Euclidean distance ...... 9 2.6.2 Cosine similarity ...... 9

3 InfoMap 11 3.1 Information theory ...... 11 3.1.1 Entropy ...... 12 3.1.2 Huffman coding ...... 13 3.1.3 Shannon’s source coding theorem ...... 13 3.2 Random walks on networks ...... 14 3.3 Two-level description ...... 14 3.3.1 Node level ...... 15 3.3.2 Module level ...... 15 3.3.3 Summary ...... 16 3.4 Map equation ...... 16 3.4.1 Hierarchical map equation ...... 17

4 The Swedish stock market 19 4.1 Stock companies and shares ...... 19 4.1.1 Securities institutions and nominees ...... 20 4.1.2 Outsourcing share registration ...... 20 4.2 Euroclear ...... 20 4.2.1 Brief history ...... 21 4.3 Trading and the stock exchange ...... 21 4.3.1 Stockholm Stock Exchange ...... 21

vii 4.3.2 Other listings ...... 22

5 Method and dataset 23 5.1 Dataset ...... 23 5.1.1 Share prices ...... 24 5.1.2 Data for analysis ...... 24 5.2 Aim of analysis ...... 24 5.3 A network approach ...... 24 5.3.1 Approach 1: Binary ownership ...... 25 5.3.2 Approach 2: Percentage of private shares ...... 25 5.3.3 Approach 3: Holding value ...... 25 5.4 Analysis execution ...... 25 5.4.1 Number of strongest links ...... 26 5.4.2 Cut off ...... 26 5.4.3 Changes between dates ...... 26 5.4.4 Partition measures ...... 27

6 Results 29 6.1 Descriptive statistics of complete dataset ...... 29 6.1.1 Degree distribution ...... 29 6.2 Number of strongest links ...... 29 6.3 Clustering ...... 32 6.3.1 Approach 1: Binary ownership ...... 32 6.3.2 Approach 2: Percentage of private shares ...... 32 6.3.3 Approach 3: Holding value ...... 33 6.4 Cut off ...... 34 6.4.1 Approach 2: Percentage of private shares ...... 34 6.4.2 Approach 3: Holding value ...... 35 6.5 Detailed results of binary approach ...... 37 6.5.1 Example of subclusters ...... 40 6.6 Changes between dates ...... 41 6.6.1 Binary approach date 2011-06-30 ...... 41 6.6.2 Comparing clustering between dates ...... 41

7 Discussion 43 7.1 Clustering ...... 43 7.1.1 Cut off ...... 43 7.1.2 Detailed examination of binary results ...... 44 7.1.3 Changes between dates ...... 45 7.1.4 Comparing clustering between dates ...... 45 7.2 Other suggested approaches ...... 45

8 Conclusions 47

viii Chapter 1

Introduction

In this chapter the main topic of the thesis and the motivation behind it are introduced. The chapter starts with a presentation of the problem background, then the problem formulation is given and ﬁnally a presentation of the outline of the thesis is provided.

1.1 Background

The financial market plays a huge role in our daily lives. The current Euro-crisis as well as historical stock market crashes like the Dot-com bubble in the late 90’s or the modern global financial crisis are testaments of the influence of economical markets in society. Whether we trade shares or not—nowadays almost everyone is affected by the stock market. In the end of 2011 about 1.5 million people, or 16%, in Sweden owned shares on the stock market, and the proportion has decreased the last ten years [37]. However, this does not mean that the Swedes have abandoned shares as a form of investment, but instead the ownership is managed indirectly through funds. About 74% of the Swedish population (18–74 years) own shares in mutual funds and if the savings for the premium pension is included almost the whole population is covered [28]. The stock market is an example of a complex system, i.e. it consists of a number of traders, interacting in such a way that their collective behaviour, the behaviour of the market, is not a simple combination of their individual behaviour [22]. One of the most important problems in modern finance is finding efficient ways of summarizing and visualizing the stock market data to obtain useful information about the behaviour of the market [7].

1.2 Problem formulation

In this thesis we aim to investigate the possibility of finding a way to summarize and cluster share ownership data from the Swedish stock market. This is done by using a network approach to analyze the structure of the share ownership in order to find significant patterns in the data. In the present situation mainly the largest stock holders of each company are of interest on the market, but since most holders have small share amounts, a lot of potentially important information about the market stays unused. The goal in this thesis is therefore to be able to cluster the owners with small holdings in order to highlight and understand how the clustering can be used to obtain useful

1 information about the stock market. A desirable feature of the analysis is that it can be displayed in a clear and accessible way, and thus the clustering result should not be too excessive. It would also be preferable if the result could be visualized. Another goal is also that the clustering should be sustainable in the longer term so that it is possible to follow different clusters and their behaviour in time.

1.3 Outline of thesis

The thesis is structured as follows: • Chapter 2 presents the ﬁeld of networks and introduces how to detect communities in networks. • Chapter 3 goes through the algorithm InfoMap that is used to ﬁnd structures in the data. • Chapter 4 provides a review of the Swedish stock market.

• Chapter 5 presents the dataset and explains the method that is used in the analysis. • Chapter 6 contains the results obtained from the analysis. • Chapter 7 gives a discussion of the results.

• Chapter 8 summarizes the outcome of the thesis in the conclusions.

2 Chapter 2

Networks

This chapter introduces the network methodology and gives examples of different kinds of networks and quantities to evaluate their properties. The chapter also treats community detection in networks and presents similarity measures.

2.1 Preliminaries

Our everyday lives are filled with different kinds of networks. Internet, World Wide Web and subway systems are all concrete examples, but networks can also be more vague such as networks of acquaintances, chemical reactions or chains of historical events [25]. Networks are everywhere in the world—they have always been—and they will always be. Despite this fact, the multidisciplinary field of networks and the modeling of systems as networks is a relatively new approach in science, which started to catch scientists’ interest in the last decade of the 20th century. This growth of network awareness and the increasing popularity of network analysis in the scientific litera- ture [8] is not only a result of the computational advances in data gathering, storage and processing technology of recent decades. It has also come to our understanding that real-world system are made up of a large number of entities, interacting in such a way that their collective behaviour is not only the sum of their individual behaviour— i.e. nature is a complex system [22]. To analyze and understand such systems, network science suits very well. An important feature of networks that describe complex systems is that they possess a significant amount of similar statistical and topological properties—regardless of the application domain. Although networks can be very different, many of their properties are common to networks of a wide variety of types [23]. Similar to the way a landscape can be simplified by a map, the topology of a real-world system can be described as a network by focusing on the connectivity pattern of its individual components [32]. So whether one wants to understand a technological, biological or social system, modeling it as a network could be a good way to start. Figure 2.1 shows an example of a real-world network. The nodes in the network are the 43 unique winners of the prestigious soccer award Ballon d’Or, often referred to as the World Player of the Year award [2]. A link has been established between two players if they have represented the same club during the same season.

3 Florian Albert Lionel Messi Johan Cruyff Ronaldinho Alfredo Di Stefano Igor Belanov Andrei Shevchenko Raymond Kopa Ruud Gullit Alan Simonsen Oleg Blokhin Cristiano Ronaldo Marco Van Basten George Weah Roberto Baggio Kaka

Omar Sivori Matthias Sammer Rivaldo Karl-Heinz Rummenigge Jean-Pierre Papin Lev Yashin Stanley Matthews George Best Josef Masopust Gianni Rivera Hristo Stoichkov Luis Suarez

Bobby Charlton Luis Figo Gerd Muller

Lothar Matthaus Paolo Rossi Denis Law Ronaldo

Kevin Keegan Franz Beckenbauer

Fabio Cannavaro Michel Platini Zinedine Zidane Michael Owen Eusebio Pavel Nedved

Pajek

Figure 2.1: A network with the 43 unique winners of the soccer award Ballon d’Or as nodes, connected with a link if they have played together in the same club during the same season. The network has been visualized with the program PAJEK.

2.2 Network structure

Mathematical representation of networks originates from graph theory, which dates back to mathematician Leonhard Euler and his solution of the Königsberg bridge problem in 1736 [17]. The term graph is usually used to present basic properties of networks in a more mathematical sense, but we will be consistent in using the term network, since it is a more adequate description capturing both the mathematical representation and the actual system. A network is a structure that consists of a set of objects, called nodes or vertices, connected by links or edges. The links between nodes can be either directed or undirected. In the case of a directed network the links can be denoted arcs. For example, if the nodes represent persons with e-mail addresses, a directed link (arc) can be established from one person to another if an e-mail is sent between them. If the person receiving the e-mail replies the link is then considered undirected. Links can also be weighted, representing for example the connection speed of sending the mail or the distances between the nodes. Normally nodes are not allowed to have more than one link between pairs, and neither are links from a node to itself permitted. An example of a network consisting of 6 nodes and 8 undirected unweighted links can be seen in ﬁgure 2.2. A measure of how many neighbours a node has is the degree, i.e. the number of

4 3 Link 2 Node

4 1 6

Figure 2.2: An example network with 6 nodes and 8 undirected and unweighted links.

links attached to a node. In the case of a directed network one has to discriminate between in-degree, the number of incoming arcs, and out-degree, the number of outgoing arcs. The probability distribution of these degrees over the whole network is called the degree distribution, P (k). Accordingly it is deﬁned to be the fraction of nodes in the network with degree k.

2.3 Complex networks

A complex system is commonly defined as a system that consists of interacting components whose collective behaviour cannot be explained from the behaviour of the individual units alone [22]. The components may act according to rules that may change over time and that may not be easily understood. A frequent characteristic of all complex systems is that they display organization without any external organizing principle being applied and a central characteristic is adaptability [5]. An example of a complex system is intelligent life in general and the human brain in particular—we have knowledge about the structure and the composition of the brain, but the thoughts and actions of its holder can seldom be predicted. Other examples of complex systems include families, societies, ecological systems, the weather, economy, information systems and financial markets [25]. It can be a good idea to consider the difference between a complex and a somewhat complicated system. One can for example compare a modern smartphone with a flock of birds. Superficially the birds are all similar and the flock has far fewer members than the smartphone has parts, and it is therefore tempting to think that the smartphone is more complex than the flock of birds. However, the flock of migrating birds is an adaptable system—unlike the smartphone. The flock responds to changes in the environment and when flying, the rules of the flock are fluid since the head of formation often are changed. The smartphone on the other hand is not a complex system since all its parts have strictly defined roles and prescribed interactions [5]. A way of handling and analyzing complex systems is by using network theory. By studying the network one can study the underlying complex system itself. The study of complex networks has become more and more important because of its ability in understanding indirect effects of the systems.

5 2.3.1 Real-world networks In order to understand the properties of real-world complex networks, different mathematical models of networks have been suggested. One of the first models was the uniform random graph model presented in 1959 by Erdos˝ and Rényi [14]. The model starts with a fixed number of nodes and sets a link between each pair of nodes with equal probability until the desired number of links are reached. Although the uniform random graph model captured some properties of real-world networks, empirical mea- surements showed significant differences in the structures when the number of nodes and links were the same. For instance, the uniform graph model could not handle the property of a high so called clustering coefficient that often takes place in the real-world networks [41, 40]. The clustering coefficient measures the extent to which nodes in a network tend to cluster together, i.e. form groups. In other words, a high clustering coefficient means that the probability of the event that two given nodes are connected by a link is higher if these nodes have a common neighbour. Furthermore, the distribution of edges is not only globally, but also locally inho- mogeneous, with high concentrations of edges within special groups of nodes, and low concentrations between these groups. This feature of real networks is called community structure [15]. However, the most important drawback of the uniform random graph model is the difference in the degree distribution compared to the real-world networks where the degree distribution often follows a power law. A power law degree distribution implies that most of the nodes in the network have a relatively low degree, while a few nodes1 have substantially higher degree, i.e. there is no typical scale in the network and such networks are often called scale-free networks. The power-law distribution of degrees k is given by P (k) ∝ k−γ , where γ is a constant. Power laws appear in for example the cumulative distribution functions of the number of citations to papers and the number of hits received by web pages, with γ around 3 and 2 respectively [24]. The general result is that real-world networks are not random graphs, neither are they very regular. Their main properties include small average distance2 between nodes, high clustering, power-law degree distributions [23] and in many cases they also reveal a complex behaviour [32].

2.4 Bipartite networks

A bipartite network, or a two-mode network, is a network whose nodes can be divided into two disjoint sets of top and bottom nodes. An example of a bipartite network is the Movie-Actor network, which consists of movies as top nodes, actors as bottom nodes and links representing whether the actor appeared in the film. Links between two actors or two films are not permitted, since nodes in the same set can’t have direct links in the bipartite case, in contrast to classical, unipartite, networks. A result of having two node sets is that bipartite networks are associated with two degree distributions; one for the top nodes and one for the bottom nodes. Many real-world networks are naturally bipartite. Example of bipartite representation include social networks like the scientific collaboration network, where the two node sets consists of papers and authors, metabolic networks in biology, where the two

1Referred to as hubs. 2The number of edges traversed along the shortest paths between all possible pairs of network nodes.

6 types of nodes are reactions and metabolites, and information networks when looking at for example a word-document network, where one type of nodes is documents (web pages, e-mails, etc.) that link to the words they contain [21]. Given a bipartite network it is possible to transform it into two unipartite networks, one containing the bottom nodes and one with the top nodes. These one-mode projections can be obtained by connecting bottom nodes that are connected to the same top node or vice versa. Figure 2.3 shows an example of a bipartite network and its two unipartite one-mode projections. It is though important to notice that the one-mode projection approach disregards information about the structure within the original network [21]. One way of capturing more of the original structure is to make the projection weighted, i.e. giving each link between two bottom nodes in the projected network a weight equal to the number of top nodes they have in common, or vice versa. However this method still doesn’t capture all of the information in the original network [25].

4 1 3 1 2 3 4 2

A B C D E A C E

B D

Figure 2.3: A bipartite network with 4 top and 5 bottom nodes and the two corresponding unipartite one-mode projections.

2.5 Community detection in networks

A property that seems to be common to many networks is community structure. Com- munity structure can be seen as the division of network nodes into groups in which the network connections are dense, but between groups which connections are sparser [16]. Social networks are classical examples of networks with communities and the word community itself refers to a social context. People naturally tend to form groups within their work environment, family and friends. An example of a simple network with three communities can be seen in ﬁgure 2.4. An important task in analyzing networks and understanding their structure is to be able to detect the communities. The aim of community detection in networks is to identify the communities and, if possible, their hierarchical organization—only by using the information encoded in the network topology. Community detection in large networks can provide valuable information as nodes belonging to a tight-knit community are more than likely to have other properties in common [15]. For example, the communities in the World Wide Web correspond to topics of interest and nowadays community information is considered to be used for improving search engines in order to provide better and more personalized results [11]. Moreover, the information dif-

7 Figure 2.4: A small network with three communities indicated by dashed circles. The internal links of communities are more dense than the between communities external links.

fusion and spreading mechanism in a network can be affected and determined by the community structures. Identifying the communities is hence a fundamental step not only for discovering what makes entities come together, but also for understanding the overall structural and functional properties of the whole network [12]. Detecting communities in networks has become a fundamental problem in network science. The human eye is an excellent tool for detecting community patterns in small networks, but for analyzing large networks another method is needed, and therefore numerous algorithms have been developed. In Ref. [20] a number of algorithms for detecting community structure are evaluated using a recently introduced class of benchmark graphs. The result of the analysis shows that the method called InfoMap, see Ref. [34, 33], displays the best performance on detecting the communities for both directed and undirected graphs—with the additional advantage of a relatively low computational complexity, which enables studies of large systems. For this reason InfoMap and its theoretical background will be further explored in the next chapter of this thesis. A more mathematical term for methods in which large sets of data is grouped into communities of smaller sets of similar data is clustering or cluster analysis. Clustering is the task of assigning a set of objects into groups, called clusters, so that the objects in the same cluster are more similar to each other than to those in other clusters based on a predeﬁned similarity measure. From now on the two concepts community detection and clustering will be used interchangeably.

2.5.1 Hierarchical clustering

The traditional method for detecting community structure in networks is hierarchical clustering [16]. Hierarchical clustering seeks to build a hierarchy of groups and the method is commonly divided in the two methods agglomerative and divisive. The agglomerative hierarchical clustering iteratively merges the two most similar objects, or clusters, until only one cluster containing all the objects remains while the divisive hierarchical clustering starts with one single cluster and works the opposite way. To distinguish which objects are most similar, a similarity measure based on the attributes of the objects must be defined, see section 2.6. An example of a tree diagram, called dendrogram, that is commonly used to illustrate the arrangement of the clusters pro- duced by hierarchical clustering, can be seen in figure 2.5. One concern about agglomerative methods is that they tend to fail with some frequency to find the correct communities in networks where the community structure is

8 Figure 2.5: Dendrogram showing an example of a hierarchical clustering of 30 objects. In the bottom of the ﬁgure all objects are assigned to their own cluster. Moving up- wards, similar clusters are merged together until only one cluster remains, containing all of the objects. The tree can be cut at a certain height to obtain the partition of the objects into a speciﬁed number of clusters.

known. This makes it difﬁcult to place much trust in their performance in other cases [26].

2.6 Similarity measures

To detect communities and cluster nodes in a network, a similarity or distance3 measure can be used. To ﬁnd an appropriate measure it is important to clarify when two nodes are considered similar, since different measures accounts for different properties of the network structure. In general the measure ﬁnds the degree of closeness, or separation, between two nodes and represents it as a single numeric value. The properties of a node could be represented in a vector, which for example could contain the node’s connections to other nodes in the networks, or the node’s position in space.

2.6.1 Euclidean distance The Euclidean distance measures the distance between two points in any dimension of space. The distance is a standard metric for geometrical problems. The Euclidean distance between two points represented by vectors x and y is given by s X 2 Euc(x, y) = (xk − yk) . (2.1) k

Euclidean distance is actually a dissimilarity measure since it is larger for vectors that differ more, and zero if the vectors are identical.

2.6.2 Cosine similarity Cosine similarity is a measure of similarity between two vectors by measuring the cosine of the angle between them [38]. The cosine similarity of vectors x and y is

3Note the difference between distance and similarity. A normalized measure of 1 indicates perfect similarity but maximum distance. A value of 0 indicates no similarity but minimum separation in distance.

9 given by hx, yi CosSim(x, y) = . (2.2) ||x|| · ||y|| The cosine similarity is bounded between 0 and 1 if the elements of vectors x and y are non-negative. A value of 0 indicates that the vectors have no non-zero elements in common, and a value of 1 indicates that the vectors have all non-zero entries in the vector in common. An important property of the cosine similarity is its independence of length. This means that vectors of the same composition but different totals are treated identically e.g. CosSim(x, y) = CosSim(x, 2y) since it is the direction of the vectors that is of importance. The measure is however not invariant to shifts. If vector x was shifted to x + 1, the cosine similarity of x and y would change.

10 Chapter 3

InfoMap

InfoMap, see Ref. [34, 33], is a community-detection algorithm that makes use of the duality between the problem of compressing a dataset, and the problem of detecting and extracting significant patterns or structures within those data. This duality is explored in the statistical field of minimum description length statistics, or MDL. The basic idea of MDL is that the more regularities in the data, the more we can compress it [18]. For a network we can think of communities as regularities—so by finding these, the representation of the network can be compressed. To analyze a given network we want to use the information concealed in the network representation. In order to capture this information and thereby better understand the network, InfoMap focuses on how the structure of the network constrains the flow of information occurring on it. The goal is therefore to setup a system that measures how much we can compress the flow on the network depending on how we partition the network. The best partition is the one that compresses the flow on the network the most. To setup this system we start by reviewing basic concepts of information theory before we explain in more detail how InfoMap works.

3.1 Information theory

Information theory involves the quantification of information and therefore has applications in many different areas. A central question in the theory is to develop a usable measure of the information acquired when observing the occurrence of an event having probability p. The first simplification will be to ignore any particular features of the event, and only observe whether or not it happened. Thus we will think of an event as the observance of a symbol whose probability of occurring is p. The information will therefore be defined in terms of the probability p [9]. We want our information measure I(p) to have several properties. First of all, information should be a non-negative quantity, i.e. I(p) 6= 0. Secondly, if an event always occur (p = 1), we get no information from the occurrence of the event, i.e. I(1) = 0. Thirdly, if two independent events occur, then the information we get from observing the events is the sum of the two informations, i.e. I(p1 · p2) = I(p1) + I(p2). Finally, we want the information measure to be a continuous function of the probability [36]. From the given axioms, the following relations can be derived: • I(pn) = I(p · p · ... · p) = I(p) + I(p) + ··· + I(p) = n · I(p) | {z } n

11 1 1 1 m m m m 1 • I(p) = I((p ) ) = m · I(p ) =⇒ I(p ) = m · I(p)

n m n In general we thus get that I(p ) = m · I(p). By continuity, for 0 < p ≤ 1, and a real number a > 0 the expression becomes

I(pa) = a · I(p).

The only function satisfying the equality above is the logarithm, and hence we can derive the information acquired when observing the occurrence of an event having probability p 1 I(p) = − log (p) = log , (3.1) b b p where b is a positive constant. The base b determines the units that are used, and it is commonly chosen to be 2, resulting in bits as units for the information measures. Unless we want to emphasize the units, we don’t have to bother specifying the base for the logarithm, and only write log(p). Typically, and from now on, we will think in terms of log2(p).

3.1.1 Entropy

In information theory, the entropy is a measure of the uncertainty associated with a random variable. Suppose that we have k symbols (a1, a2, . . . , ak) and a source providing us with a stream of these symbols with respective probabilities (p1, p2, . . . , pk). Assuming that the symbols are emitted independently by the source, it is interesting to ﬁnd the average amount of information received from each symbol in the stream. If 1 symbol ai is observed, then I(pi) = log( ) information is gathered from that particu- pi lar observation. In a long run of observations, say N, approximately N ·pi occurrences of symbol ai will occur. Thus, in the N observations, we will get the total information

k X 1 I = (N · p ) · log . (3.2) N i p i=1 i

The average information received per symbol hence becomes

k k IN 1 X 1 X 1 = (N · p ) · log = p · log . N N i p i p i=1 i i=1 i

As we have observed, the information has strictly been deﬁned in terms of the probabilities of the events. Looking at the provided symbol as a random variable X— with sample space (a1, a2, . . . , ak) and probability distribution P = (p1, p2, . . . , pk)— we deﬁne the entropy H(X) of the random variable X as [10]

k X 1 H(X) = p · log , (3.3) i p i=1 i

Pk where each pi ≥ 0 and i=1 pi = 1. An important property of entropy is that it is maximized when all the symbols are equally probable.

12 3.1.2 Huffman coding

In order to model a trajectory on a network based on an information theoretic approach we need a smart way to name nodes. Consider for example a random walk on a network consisting of 5 nodes. As a first approach we could assign binary codes to each node, requiring log2(5) = 3 bits in each name to uniquely label the nodes. A 39-step walk on the network can therefore be described in 3 · 39 = 117 bits. Suppose now that we know the visiting frequencies of the nodes in our presumptive walk. In this case a straightforward method of giving names to nodes is instead to use Huffman coding [19]. Huffman coding is an entropy encoding algorithm that assigns short codewords to common events with high probability and longer codewords to rare ones. Suppose for example that we have the node visit frequencies of the five nodes, which we denote A–E, given in the table in figure 3.1a. We can see that the visit frequency, i.e. the probability, of node A is much higher than the probability of the other nodes—and therefore a shorter code is used for A. A Huffman encoding based on the frequencies can be computed by first creating a tree of nodes, starting with the two nodes having the lowest frequencies, denoted children nodes, and create a so called parent node from them, having the sum of the children’s frequencies. The two branches of the tree are then assigned with code 0 and 1 and the procedure is repeated including the parent node and removing the children. When all nodes have been considered and the tree is complete, the code of a node is found by starting at the top of the tree and following the branches down to the node while collecting the binaries. The procedure is illustrated in figure 3.1b and for node E we can for example see that by following the tree from the top down to the node, Huffman code 111 is obtained. The Huffman coding is also prefix-free, which means that no code is a prefix to any other code. In this way the codewords can be uniquely decoded even if codes are sent after each other as a long signal, as long as the coding table is sent before the data. To sum up—if prior statistics are known about the system that we want to code—then Huffman coding is a good method of compressing data.

P4(39) Node Count Code # of bits 0 1 A 15 0 15 P3(24) B 7 100 21 A(15) 0 1 C 6 101 18 P2(13) P1(11) D 6 110 18 0 1 0 1 E 5 111 15 TOTAL: 87 B(7) C(6) D(6) E(5) (a) Node visit frequencies, the nodes’ unique (b) A Huffman tree generated from the frequencies in code numbers and the total number of bits for the table to the left. The value in brackets indicates the their occurrences using the Huffman code. total frequency count of a node.

Figure 3.1: Example of Huffman coding.

3.1.3 Shannon’s source coding theorem

Shannon’s coding theorem determines the limits of possible data compression, i.e it gives a lower limit for the length of code words describing the data. The requirements are that the code should be uniquely decodable—it should be possible to parse any

13 codeword unambiguously into the corresponding data1. The theorem states that the average code-word length L for a source of entropy H(X) is bounded as

L ≥ H(X), (3.4) which means that the average length of a codeword can be no less than the entropy of the random variable itself [36]. The theorem states that when you use N codewords to describe the N states of a random variable X, which occurs with frequencies pi, the average length of a codeword can be no less than the entropy, H(X), of the random variable itself.

3.2 Random walks on networks

To model the flow occurring on a network, InfoMap makes use of a random walker and follows his trajectory on the network structure. The walker starts at a randomly chosen node and moves in the next time step through one of the node’s links to a neighbouring node. The probability of choosing a link is proportional to the relative weight of the link. If the network is directed there is a chance that the random walker gets stuck when for instance a node lacks outgoing links. For this reason a small teleportation probability, τ, is introduced, meaning that the walker at every step with probability τ jumps to a randomly chosen node anywhere in the network [34]. In each time step the procedure is repeated and the walker moves on the network, creating a trajectory of visited nodes. In order to save this trajectory as a coherent data stream, nodes must first be given unique code names. The trajectory of the random walker provides important information about the structure of the network. The ergodic node visit frequencies of the walker specifies the statistical probabilities of being at a certain node. This information is used to match the length of codewords to the frequencies of their use by giving frequently visited nodes shorter names according to Huffman coding, explained in section 3.1.2. In this way we have compressed the data of the trajectory by finding regularities in the network and a first step in finding communities with InfoMap is accomplished.

3.3 Two-level description

Real-world networks often display community structure. In the sense of a random walker on a network, this can be seen as a set of regions in which the walker tends to spend much time and between which movements are more rare. This regional structure can be used in order to minimize the trajectory code of visited nodes by giving each region, called module, its own codebook2. In this way the network is divided into two levels of description, one module level and one node level within each module. A dis- tinction between within-module movements and between-module movements therefore has to be made. To describe the network, unique Huffman code names are maintained for the modules, and within these modules names are reused for the individual nodes. In this

1 For instance, coding nodes c1, c2, c3 and c4 as c1 = 0, c2 = 01, c3 = 11 and c4 = 00 is not a uniquely decodable code. The codeword 0011 could be either c3c4 or c1c1c3. 2This approach can for instance be compared to the dialing codes used in Sweden. People in a certain region are more likely to call each other and hence only have to dial the telephone number without the dialing code. In this way the same telephone numbers can be reused in different dialing code areas and the average length of telephone numbers dialed becomes shorter than if everyone in Sweden would have a unique one.

14 way two codebooks are needed—first a module codebook, which specifies the node names of each module—and a second index codebook, which specifies which module codebook to be used. A special codeword, the exit code, is chosen as part of the within-module coding and indicates that the walk is leaving the current module. The exit code is therefore always followed by the code of the new module into which the walk is moving. Hence the method of describing the network in two-levels introduces extra codewords both when the random walker enters and exits modules. The two-level description provides the problem of finding a balance where the modules are small enough to reduce the average node codeword length—but large enough and divided in such a way that the random walker statistically stays there for a long time before leaving, so that the cost of using codewords for entering and exiting modules is not too high. This is the optimization problem facing InfoMap, and in order to understand how it is solved, we now go into a little more detail about the theory behind by dividing it into the two levels of nodes and modules.

3.3.1 Node level

Let pα denote the probability of the random walker being at a certain node α. For an outgoing link from α to node β, having weight wα,β, we can calculate the probability that the random walker follows this link in a given step. For this to happen the walker must not teleport, which has probability (1 − τ), where τ is the probability of teleportation. The total probability of a given step, qαyβ, between node α and β in a module hence becomes

qαyβ = pα · wα,β · (1 − τ). (3.5) The node visit frequency of node α, with the contribution from random teleportation excluded, is then the sum of the probabilities of moving to the node X pα = qβyα. (3.6) β

The total probability of within-module movements in module i then becomes the sum P of the probabilities over all nodes in the module, i.e. α∈i pα.

3.3.2 Module level

With an initial partitioning of a network containing n nodes with probabilities pα, for α = 1, . . . , n, it is straightforward to calculate the module visit frequency of module i. P This is the sum α∈i pα of the probabilities for all nodes within the module. Exiting a module can happen in two ways—either by teleportation to another mod- ule3 or by following a link to another module. The probability of exiting a speciﬁc module i, qiy, having ni nodes is therefore given by

n − ni X X X q = τ · · p + (1 − τ) · p w , (3.7) iy n α α α,β α∈i α∈i β∈ /i where accordingly the ﬁrst term describes the probability of teleportation to a node outside module i from every node α in i, and the second term describes the total probability of not teleporting but instead moving from module i through a link from node α in module i to node β in another module.

3 n−ni Note that the probability of randomly choosing a node outside module i, having ni nodes, is n .

15 The per step probability that the random walker switches modules, qy, then becomes the sum of the exit probabilities for all modules

m X qy = qiy. (3.8) i=1

3.3.3 Summary We have now derived expressions both for the probability of movements between nodes in modules and the probability of movements between the modules themselves, and hence it is possible to compute the entropies for the respective movements. Before applying Shannon’s source coding theorem to the derived probabilities there is however one more thing to consider—the exit codewords. In order to also adjust the length of the exit codewords to the frequency of their use, these codewords are encoded together with the within-module codewords. Since the exit codewords are necessary to separate movements within-modules from between- modules, the exit probability of module i, qiy, is included in the probability of the within-module movements. By doing this we can compute the total within-module movement probability of module i, pi , which is the probability of exiting the module, qiy, according to Eq. (3.7), plus the module visit frequency of module i, X pi = q + p . (3.9) iy α α∈i

By using the above probability and calculating the entropy according to Eq. (3.3), Shannon’s coding theorem now gives the limits of possible data compression for coding H(P i), the entropy of movements within module i. ! i qiy qiy H(P ) = P log P qiy + β∈i pβ qiy + β∈i pβ ! (3.10) X pα pα + log . q + P p q + P p α∈i iy β∈i β iy β∈i β

In the same way H(Q), the entropy for the movements between modules, can be calculated. m ! X qi qi H(Q) = y log y , (3.11) Pm q Pm q i=1 j=1 jy j=1 jy which is the lower limit of the average length of a codeword used to name a module. In analogy with within-module movements we here have used Shannon’s source coding theorem and treated the modules as m states of a random variable X that occur with Pm frequencies qiy/ j=1 qjy.

3.4 Map equation

We now have derived the expressions to display the core in the InfoMap algorithm— the map equation. For a network partition M of n nodes into m modules the map equation, L(M), gives the average number of bits per step that it takes to describe an

16 inﬁnite random walk on the network. By collecting the terms from both the within- and between-module movements the map equation reads

m X L(M) = q H(Q) + pi H(P i), (3.12) y i=1 where the first term gives the average number of bits necessary to describe movement between modules, and the second term gives the average number of bits necessary to describe movements within modules. In the first term qy is the probability that the random walker switches modules on any given step and H(Q) is the entropy of the module names. In the second term H(P i) is the entropy of the within-module movements including the exit code and pi is the fraction of within-module movements that occur in module i plus the probability of exiting module i. To find the network partition that minimizes the map equation InfoMap uses a combination of two methods4.

3.4.1 Hierarchical map equation The map equation tries to ﬁnd community structure by considering a two-level description of the ﬂow on the network. The organization of real-world networks is however rarely limited to only two levels—social and biological systems are for example often characterized by hierarchical organization [29]. In order to account for these hierarchical structures in networks, a generalized coding structure based on the two-level map equation has been developed, called the hierarchical map equation, see Ref. [35] for a detailed description. In the hierarchical map equation the constraint of a two-level description is released and an arbitrary number of submodules is permitted.

4Greedy search and simulated annealing.

17 18 Chapter 4

The Swedish stock market

Financial markets are examples of complex systems, i.e. they generally consists of a number of so called agents (traders), interacting in such a way that their collective behaviour is not a simple combination of their individual behaviour [22]. Although every single one of the agents conducts his activities with the aim of realizing the highest possible profit—which he tries to achieve by interacting with other agents through the selling and buying of financial assets at different times—the response of the market is often not predictable. This chapter presents a review of the financial market concerned in this thesis—the Swedish stock market system.

4.1 Stock companies and shares

A joint stock company, referred to as a stock company hereinafter, is a business entity where the owners themselves are not financially responsible for the company. Swedish stock companies can be divided into the categories of private stock companies or public stock companies 1. A private stock company must have at least 50 000 SEK in equity and public companies at least 500 000 SEK. It shall be stated in the stock company’s statutes whether the company is public and it is therefore not only sufficient that the company has more than 500 000 SEK in capital to be public. In Sweden there are about one thousand companies that are public stock companies and these are entitled to public offering of shares and listing on the stock exchange [4]. In order to identify shares on the market, each company’s share is assigned an unique ISIN code (International Securities Identification Number). Only public stock companies may trade shares on a Swedish or foreign stock exchange or another organized marketplace. In Sweden and other Scandinavian countries, there is a regime with different voting rights on shares, called A- and B-shares, where an A-share entitles the holder to more2 votes in the company than a B-share. The trading in the two different shares are conducted separately and hence both shares have their own ISIN code. Other share types, such as preference shares3, also exist but not at all in the same extent as A- and B-shares. All stock companies have an obligation to keep a share register which shows who are the owners of the shares in the company. Keeping a register of shareholders in a

1Also called publicly traded companies. 2The B-shares often equals 1/10 (one-tenth as many votes) of an A-share, but also older 1/1000 (one thousandth of the number of votes) as inﬂuence degree occurs. 3A share type that may have priority over other shares when it comes to dividend and liquidation.

19 listed company can mean considerable work since the changes that occur in the share register sometimes are extensive. On average, approximately 80 percent of all shares changes owner during one year, but the changes of the major owners are usually smaller [42]. In the register, shares are generally registered in the owner’s name. The only excep- tion is for nominee registered shares4, which are registered in the name of the nominee. The share register is public and anyone who requests it is, provided payment of admin- istrative costs, entitled a printout of the current register shareholders or parts of it. This printout shall include the shareholders with more than 500 shares in the company [4].

4.1.1 Securities institutions and nominees A securities institution5, also called stockbroker, can be either a bank or a firm (such as Internet brokers) that have a license from the Swedish Financial Supervisory to conduct trading on behalf of the customer [6]. A securities institution acts like the intermediary between buyers and sellers, i.e. in their own name trade securities on behalf of clients. There is nothing that prevents individuals and businesses from buying and selling shares with each other without the intervention of a stockbroker—however the difficulty is often to find the counterpart on the stock market and the securities institutions have these preconditions. Another type of exchange trading on behalf of a customer is conducted through funds. A fund is a major investment in shares, obligations, options or other securities. Instead of trading in individual stocks, there is an option to purchase fund shares from a fund nominee. The fund nominee invests the fund’s money in shares of different classes and manages the administration of the fund which means that the nominee is registered as shareholder in the company’s share register. On the Swedish market there are a little over 80 fund management companies which together with foreign fund management companies offer savings in more than 5 000 funds.

4.1.2 Outsourcing share registration A stock company can outsource the share registration by leaving the management of the share register to a so-called central securities depository. There are currently only one central securities depository in Sweden, namely Euroclear Sweden AB [31]. There are primarily public companies such as listed companies and other companies that already have or plan to have many shareholders that have opted to be Euroclear-registered companies. For the companies that are not Euroclear-registered, it is the board that is responsible for managing the share register.

4.2 Euroclear

Euroclear Sweden AB is a central securities depository which keeps a record of the vast majority of equity and debt securities traded on the ﬁnancial markets in Sweden. The company also carries out clearing and settlement of transactions in Swedish shares and interest-bearing securities and the business is based entirely on automatic processing. The company is a member of the Euroclear Group, the world’s largest provider of domestic and cross border transactions of shares, bonds, derivatives and funds [39].

4Förvaltarregistrerad aktie in Swedish. 5Värdepappersinstitut in Swedish.

20 Each private shareholder in Sweden has a personal account at Euroclear and when there has been a change in the account—when shares are purchased or sold—the shareholder receives a compilation of the holdings on their securities account [42]. A central part of Euroclear’s work with the registration process is to manage the share registers of the registered companies. To become a Euroclear-registered company, the company has to reach an agreement with Euroclear and a so called depository agent institute, i.e. a particularly accepted bank or brokerage [1]. Euroclear charge for their services and the registered companies must also in some cases provide guaran- tees to Euroclear. In December 2011, the number of Euroclear-registered companies in Sweden was around one thousand [39].

4.2.1 Brief history

The Swedish Securities Register Centre, Värdepapperscentralen (VPC), was founded in 1971 with the task of dealing with the share registers of Swedish companies, execute instructions on dividends and issue stock certificates. In 1989 the processing of securities changed significantly when physical share certificates ceased to exist in Sweden. VPC was therefore instead given the responsibility for the new account based system of securities and settlement of security transactions. In 2008 the Belgian Euroclear Group acquired all shares in NCSD (Nordic Central Securities Depository, Scandinavia’s securities depository) which in turn owned all shares in VPC. The Swedish central securities depository had thus acquired a new, foreign owner, and in 2009 VPC’s name was changed to Euroclear Sweden. The company continues to be a Swedish-registered company governed by Swedish law and thereby under the supervision of Finansinspek- tionen, the Swedish Financial Supervisory Authority [31].

4.3 Trading and the stock exchange

Stock companies can record their shares on a stock exchange to make them tradeable on a regulated market. A stock market is a company which is licensed by Finansinspek- tionen to run one or more so called regulated markets for securities trading. In order to be listed on a stock exchange, stock companies often must complete a comprehensive review. In Sweden, stocks are traded on the regulated market places at NASDAQ OMX Stockholm AB (NASDAQ OMX) and Nordic Growth Market NGM AB (NGM). Most of the trading, both in number and sales, takes place primarily on NASDAQ OMX where most of the companies are listed [3].

4.3.1 Stockholm Stock Exchange

Nasdaq OMX Nordic Stockholm, often called Stockholm Stock Exchange, is a marketplace for trading securities. In addition to shares in various Swedish companies, also other types of securities including bonds, warrants and options are traded. All listed companies sign an agreement with the Stockholm Stock Exchange and thereby agree to follow certain rules regarding for example accounting and information. There are currently more than 500 Swedish companies whose shares are listed on the stock market and about half of these are listed on the Stockholm Stock Exchange [30].

21 4.3.2 Other listings Companies can seek new share capital without this being arranged through a listing on the Stockholm Stock Exchange. These companies may want to seek equity or venture capital through other channels. Therefore there is a need to create a trade in the shares of these companies even if they do not meet the requirements of the Stockholm Stock Exchange [42]. For this reason an exchange company or a securities company may be authorized to operate as a so called trading platform. The companies whose shares are traded on a trading platform has simpliﬁed regulations to follow and thus also smaller companies can be included. In Sweden, share trading is conducted on the trading platforms First North6, Nordic MTF7, Burgundy and AktieTorget. Swedish shares are also traded on other European trading platforms [3].

6Operated by NASDAQ OMX. 7Powered by NGM.

22 Chapter 5

Method and dataset

This chapter presents the structure of the data considered in this thesis. The chapter also explains the method that is used to analyze the dataset.

5.1 Dataset

The dataset consists primarily of the share amounts for the owners in all the Euroclear- registered companies in Sweden. Thus, the dataset consists of the ownership on the Swedish stock market and may consist of both individuals, legal persons and corporations from both Sweden and other countries. The dataset is provided by Euroclear Sweden AB and extracted from the quarterly share register reports between 2009 and 2011, with a total of 13 dates. Data about the share amount of companies and their corresponding share ISIN codes are given separately for all dates. The ownership dataset is formatted in the way seen in table 5.1.

Table 5.1: Example of the ownership dataset appearance.

98602188552D190337SESE0000123456000000000001230 89906222188D252169SESE0000215736000000000000020 00202205876F467534SESE0000198675000000034248978 . .

The meaning of the digits on each row in the data is explained below:

9 8602188552 D 1 90337SE SE0000123456 000000000001230 |{z} | {z } |{z} |{z} | {z } | {z } | {z } Class Personal ID Share Account Zip code & Share ISIN number Number of shares ﬁgure number registration type country ID type

The class figure designates if the holder is a company, e.g. a stockbroker, or another type of corporation, and in these cases the personal identification number is replaced by a corporate identification number. Owners lacking a Swedish personal or corporate identification number can also be differentiated by the century figure.

23 5.1.1 Share prices Additional data also includes a list of share prices for companies listed on the Stock- holm stock exchange, from the second and last quarter of 2011. This dataset was obtained from the Swedish Central Statistics Ofﬁce, Statistiska Centralbyrån (SCB). The list of data states the share prices at closing time, i.e. the price of the latest sold share on the last trading day1 of the quarter. Altogether the list contains share prices for over 500 companies.

5.1.2 Data for analysis In this thesis we will consider the holdings of individuals having a Swedish personal identiﬁcation number2. The reason for this is because this branch of holders, unlike corporations, are hard to naturally divide and categorize into groups, although they in number constitute a major part of the market. Most of these individuals hold small share amounts and are therefore not of interest for the companies, but another reason for this is also that the group often contains a large number of individuals, which makes the group hard to understand and extract useful information from. Furthermore, we will only consider the holdings in companies with price information. This means that it is primarily the companies listed on the Stockholm stock exchange that are included. This is desirable since more information is obtained, and also because only a minor part of the holders are included among the rest of the companies. Overall this means that the data concerned in the analysis is taken from the second and last quarter of 2011, at both dates containing around 1.7 million individuals holding shares in roughly 500 different ISIN codes3.

5.2 Aim of analysis

The primary aim of the data analysis is to investigate if it is possible to cluster owners based on their holdings. The clustering result should be used in order to summarize and highlight patterns in the overall share ownership structure. A desirable feature of the clustering is that it should be sustainable over time, i.e. a cluster occurring at one date should not disappear in the following date, unless there are strong reasons for this. Moreover, another goal of the clustering is that it should be displayed in a clear and accessible way and thus not be too excessive. The secondary aim of the analysis, arising if the primary clustering is possible, is to try to ﬁnd out of how the obtained clusters may change their holdings in companies over time, in order to understand their trading behaviour.

5.3 A network approach

To analyze the dataset we want to model it as a network with individuals as nodes. In this section we present three different network approaches and how the links are created in each case. For all approaches, as a starting point we will look at the collection

1Nasdaq OMX Nordic Stockholm is usually open business hours and closed on holidays. 2This corresponds to individuals having century ﬁgure 8, 9 or 0. 3Note that a company can have more than one ISIN code associated with it. This is due to the fact that A- and B-shares, and other potential share types in a company, must have different ISIN codes.

24 of share holdings for each individual—these holdings will be referred to as the individual’s portfolio. In all approaches we will consider the portfolio of an individual as a vector where a nonzero element at index i represents the individual’s possession value4 in share i. Each share thereby corresponds to a dimension in the resulting vector space. For instance, looking at a total of 4 shares, the portfolio vector p of an individual having 5 shares in share 1 and 10 shares in share 3, can be expressed as p = (5, 0, 10, 0). The portfolio vectors are used to compute a cosine similarity measure between individuals. If the similarity value of two individuals is nonzero, an undirected link with weight according to the calculated similarity measure value is established between them. By the properties of the cosine similarity measure, if two individuals do not have a common holding in at least one share, the similarity measure value becomes zero, independent of possession type, and in these cases no link is established between them. If two individuals hold shares in only one common share, then their similarity measure value becomes one, independent of possession type. The similarity measure is however affected by the possession values of the portfolio if none of the two previous cases occur, and these possession values are created differently for the three approaches.

5.3.1 Approach 1: Binary ownership In this ﬁrst approach the portfolio vector is considered binary, meaning that a holding in share i is represented by a value of 1 at element i, regardless of the holding size. A zero at element i in the vector indicates that the individual has no possession in share i.

5.3.2 Approach 2: Percentage of private shares In this approach a nonzero element in the portfolio vector at index i represents the individual’s percentage of private shares in share i. If an individual for example holds 50 shares in share i with a total of 100 private shares, element i of the portfolio vector will be 0.5. Possession values in the vector will hence range between 0 and 1, where 0 at element i means that the individual holds no private shares in share i, and 1 means that the individual owns all the private shares in share i.

5.3.3 Approach 3: Holding value In the third approach the elements of the portfolio vector represents the holding values of the individuals, i.e the number of shares times the price of one share. This means that the possession values for different elements can vary depending on share amount and share price. As before, a zero at element i indicates that the individual holds no shares in share i.

5.4 Analysis execution

In order to analyze the network structure and search for communities we use the In- foMap algorithm, described in Chapter 3. The reason for choosing this analysis method is because of its impressive results in a comparative analysis of different cluster algorithms, being the method recommended by the authors, see Ref. [20]. To be able to ﬁnd the best representation of the network structure we use the algorithm with the hierarchical method. The hierarchical method compares the result with the ordinary two-level

4This can for example be the number of shares, the percentage in the company, etc.

25 method and computes the gain in compression, if a hierarchical representation with a smaller code length exists. Using the whole network in the analysis would result in too much data to effectively handle and analyze. Therefore a subset containing 100 000 randomly chosen individuals are used. Among these only the links with the highest value, i.e. the strongest links, will be used. Only considering the N strongest links is partly due to efﬁciency reasons and partly due to the fact that too many individuals will be connected, as a result of the fact that a link is created between all individuals holding shares in a company. Another reason for accounting only for the strongest links is that these links are the ones holding the most important information regarding the holding properties of the individuals. To sum up, the analysis for all approaches is performed with 100 000 randomly chosen individuals and their strongest links, stemming from the dataset most up-to-date, from the last quarter of 2011. The same 100 000 individuals are used for all approaches in order to be able to compare the results. In the analysis the statistics of clusters at the top level of the hierarchy, the so called top-level clusters, will primarily be examined.

5.4.1 Number of strongest links To find an appropriate value for the number of strongest links to use in the analysis, clustering results for individuals holding shares in only one specific company will be examined. In all approaches these individuals will all be connected by links, and therefore we would like these individuals to be clustered together even if only the strongest links are concerned. When all individuals are holding shares in only one company, all links will have maximum value one for all approaches. In this case of multiple strongest links, the N links are chosen randomly for each individual. The condition that individuals holding shares in only one are to be clustered together creates a lower bound guideline for the number of strongest links to use. If too few links are used, there is a risk that the individuals are separated, and this would not be representative for the network properties. To find a proper value of N of the number of strongest links to be used, InfoMap is used on various networks of individuals holding shares in only one company, generated by using different values of the number of strongest links. The clustering results can then be used to find an approximated minimum value of the number of links for the final analysis.

5.4.2 Cut off A lower limit cut off is introduced in order to tune the data and disregard information of low importance. The cut off is implemented for approach 2 and 3 in two different ways. In approach 2 the cut off corresponds to ignoring the smallest share amounts in each company. In approach 3 the cut off corresponds to ignoring the smallest holding values globally. In the analysis cut off values of 10% and 20% will be used for both approach 2 and 3.

5.4.3 Changes between dates A goal of the analysis is that the clustering result should be possible to follow between different dates. An analysis of the network structure will therefore also be performed at the ﬁrst date with company price info, 2011-12-30, with the approach showing the most satisfying result.

26 5.4.4 Partition measures As a summary measure of the typical size of a module, the quantity C is used. For a partition of a total number of N nodes into m modules, C is calculated as

m 1 X C = n · n (5.1) N i i i=1 where ni is the number of nodes in module i. In the clustering result the measure will be used to ﬁnd the typical module size at the top-level of the hierarchical structure. The clustering result from InfoMap will display the ergodic node visit frequencies of the nodes in the analyzed network. The ergodic node visit frequencies of an inﬁ- nite random walk is actually the PageRank values [27] of the nodes. Normally, when working on network with physical links, the PageRank corresponds to something concrete that in many cases can be interpreted as a quantity. For our network the links are measures of how similar nodes are, so instead we will infer the quantity as a measure of shared information, where a high value of PageRank means that the node shares information with many other nodes.

27 28 Chapter 6

Results

In this chapter the results are presented. The chapter starts with some descriptive statistics of the data followed by the result for the number of strongest links. The results from the three approaches are then presented among with the cut off results, a detailed examination of the results from the binary approach and a comparison between dates. In the results all share names have been coded.

6.1 Descriptive statistics of complete dataset

In order to gain more knowledge about the data, descriptive statistics of the whole dataset were computed. The results presented here are extracted from the data of date 2011-12-30. The holdings in companies on the Stockholm stock exchange for individuals having a Swedish personal ID number are considered. Summarized statistics about the ten largest shares concerning number of holders can be seen in Table 6.1b.

6.1.1 Degree distribution The distribution for the number of holdings of individuals, i.e. the number of ISIN codes that individuals hold shares in, can be seen in Table 6.1a. The ownership degree distribution, plotted as the logarithm of the number of companies per owner versus the logarithm of the number of owners can be seen in Figure 6.1. Among the around 1.7 million individuals, the maximum number of holdings of an individual is 657.

6.2 Number of strongest links

For the complete dataset, about 45% of the individuals hold shares in one company and the company with the largest number of individuals holding shares in only their company is about 180 000. For a set of 100 000 randomly chosen individuals we therefore expect that approximately 10 000 is the largest number of individuals in a group where all hold shares in only one unique company. In this case everyone of the 10 000 individuals will have a link with maximum weight one to the others, by the properties of the similarity measure. When running the clustering algorithm we would like these individuals to form one significant cluster, even when we only consider a random number of links. To find the minimum number of links for fulfilling this condition we construct the network by varying the number of links and examine how many clusters

29 106

105

104

103

102

Number of individuals 101

100 100 101 102 103 Number of holdings

Figure 6.1: The number of share types per owner versus the number of owners, i.e. the ownership degree distribution.

Table 6.1: Descriptive statistics of the dataset

(a) Distribution of the number of hold- (b) The ten largest shares concerning number ings for individuals at date 2012-12-30. of holders at date 2012-12-30. The total number of holders is 1 685 372.

Unique shares Percentage Share Holders Percentage 1 45% Share A 550 759 33% 2 19% Share B 450 667 27% 3 10% Share C 306 129 18% 4 6% Share D 262 362 16% 5 4% Share E 181 179 11% 6 3% Share F 177 024 11% 7 2% Share G 106 414 6% 8 2% Share H 105 432 6% 9 1% Share I 99 720 6% ≥ 10 8% Share J 98 176 6%

30 the network is partitioned into when analyzed with InfoMap. In Figure 6.2 the number of clusters obtained from using InfoMap with the two-level map equation is plotted versus the number of random links used to generate the network for 10 000 individuals holding shares in the same company. The plot shows relatively large error bars for 17 and 18 links. This is due to the fact that the network alternates between being partitioned into one single cluster and about the same number of clusters as for 16 links, depending on how the links are chosen.

400

300

200

Number of modules 100

0 16 18 20 22 Number of links

Figure 6.2: The number of modules obtained with InfoMap versus the number of random links (averaged over 10 independent simulations) for a network of 10 000 individuals, all holding shares in the same company. The error bars indicate standard errors for the simulations.

31 6.3 Clustering

The three network approaches are analyzed with InfoMap using ﬁve partition attempts1 for each network. The clustering result obtained from InfoMap consists of the network partition that minimizes the code length, with the cluster classiﬁcation of each individual and the nodes respective PageRank value. In order to be able to compare the different approaches, the same 100 000 randomly chosen individuals are analyzed in each approach.

6.3.1 Approach 1: Binary ownership

A summary of the results when using the binary ownership approach can be seen in Table 6.2. The following network and clustering statistics were obtained:

• Links: 1 807 204. • Gain over two-level code: 0.72%. • C = 13 620. • Top-level clusters: 23.

Table 6.2: Results from the hierarchical clustering with InfoMap using the binary approach for 100 000 randomly chosen individuals.

Cluster PageRank Individuals Subclusters 1 0.24 23 924 357 2 0.22 19 379 147 3 0.17 15 290 92 4 0.09 8016 70 5 0.05 5643 83 6 0.05 6171 105 7 0.04 4362 71 Total: 0.86 82 785

6.3.2 Approach 2: Percentage of private shares

A summary of the results when using approach 2 with percentage of private shares can be seen in Table 6.3. The following network and clustering statistics were obtained:

• Links: 1 747 421. • Gain over two-level code: 1.82%. • C = 4 541. • Top-level clusters: 105.

1This number is optional when running InfoMap and corresponds to the number of partitioning attempts by the algorithm.

32 Table 6.3: Results from the hierarchical clustering with InfoMap using approach 2 with percentage of private shares for 100 000 randomly chosen individuals.

Cluster PageRank Individuals Subclusters 1 0.13 11 479 25 2 0.13 10 927 29 3 0.09 7651 41 4 0.07 7632 13 5 0.05 4336 15 6 0.03 3278 54 7 0.03 2990 34 Total: 0.53 48 293

6.3.3 Approach 3: Holding value

A summary of the results when using approach 3 with holding value can be seen in Table 6.4. The following network and clustering statistics were obtained:

• Links: 1 747 406. • Gain over two-level code: 2.21%. • C = 5 240. • Top-level clusters: 98.

Table 6.4: Results from the hierarchical clustering with InfoMap using approach 3 with holding value for 100 000 randomly chosen individuals.

Cluster PageRank Individuals Subclusters 1 0.15 12 885 119 2 0.14 12 166 61 3 0.09 7741 40 4 0.07 6367 61 5 0.07 7013 106 6 0.03 2549 38 7 0.02 2175 37 Total: 0.57 50 896

33 6.4 Cut off

When using a cut off in approach 2 and 3, the smallest link values are removed. As the results show, a larger value of the cut off means that more links and individuals are excluded from the analysis.

6.4.1 Approach 2: Percentage of private shares In this section the results obtained when using approach 2 with percentage of private shares and a cut off are presented.

Cut off 10% A summary of the results when using approach 2 with percentage of private shares and cut off 10% can be seen in Table 6.5. In this analysis the 10% smallest share amounts of each ISIN code were excluded. The following network and clustering statistics were obtained:

• Links: 1 640 875. • Individuals: 93 537. • Gain over two-level code: 1.68%. • C = 63 112. • Top-level clusters: 11.

Table 6.5: The result from the hierarchical clustering with InfoMap using approach 2 with percentage of private shares and 100 000 randomly chosen individuals with a cut off at 10%.

Cluster PageRank Individuals Subclusters 1 0.86 77 959 58 2 0.14 15 283 356 3 <0.01 68 3 Total: 1.00 93 310

Cut off 20% A summary of the results when using the approach 2 with percentage of private shares and cut off 20% can be seen in Table 6.6. In this analysis the 20% smallest share amounts of each ISIN code are excluded. The following network and clustering statistics were obtained:

• Links: 1 518 862. • Individuals: 86 268. • Gain over two-level code: 1.77%. • C = 3 697. • Top-level clusters: 94.

34 Table 6.6: The result from the hierarchical clustering with InfoMap using approach 2 with percentage of private shares for 100 000 randomly chosen individuals using a cut off at 20%.

Cluster PageRank Individuals Subclusters 1 0.16 11 736 5 2 0.12 9311 22 3 0.08 6077 33 4 0.07 6708 9 5 0.04 3295 13 6 0.03 2859 42 7 0.03 2363 33 Total: 0.53 42 349

6.4.2 Approach 3: Holding value

In this section the results obtained when using approach 3 with holding value and a cut off is presented.

Cut off 10 %

A summary of the results when using approach 3 with holding value and a cut off at 10% can be seen in Table 6.7. In this analysis the 10% smallest holdings globally, i.e. the smallest holdings among all 1.7 millions individuals, are excluded. The following network and clustering statistics were obtained:

• Links: 1 661 759. • Individuals: 94 639. • Gain over two-level code: 2.11%. • C = 5 145. • Top-level clusters: 79.

Table 6.7: The result from the hierarchical clustering with InfoMap using approach 3 with holding value for 100 000 randomly chosen individuals and a cut off at 10%.

Cluster PageRank Individuals Subclusters 1 0.16 12 829 100 2 0.15 12 581 63 3 0.08 6836 39 4 0.07 6785 97 5 0.06 4865 31 6 0.04 4754 179 7 0.03 2404 38 Total: 0.59 51 054

35 Cut off 20 % A summary of the results when using approach 3 with holding value and a cut off at 20% can be seen in Table 6.8. In this analysis the 20% smallest holdings globally, i.e. the smallest holdings among all 1.7 millions individuals, were excluded. The following network and clustering statistics were obtained: • Links:1 546 570. • Individuals: 87 564. • Gain over two-level code: 2.14%. • C = 4753. • Top-level clusters: 67.

Table 6.8: The result from the hierarchical clustering with InfoMap using approach 3 with holding value and 100 000 randomly chosen individuals and a cut off at 20%.

Cluster PageRank Individuals Subclusters 1 0.17 12 586 61 2 0.15 12 011 108 3 0.08 6123 51 4 0.07 6956 88 5 0.07 5310 35 6 0.03 2435 32 7 0.03 2454 34 Total: 0.60 47 875

36 6.5 Detailed results of binary approach

In this section the share content and holding distribution for the seven top-level clusters with the highest PageRank values from the binary approach is presented. The results are presented in Tables 6.9 to 6.15, where the subtable to the left displays the largest shares in the cluster concerning number of holders, and the right subtable shows the distribution of the number of holdings of the individuals in the cluster. Only the shares with most holders are displayed in the tables, and at the bottom of the tables the total number of shares in the cluster can be seen. For example, in Table 6.9, 2 holdings means that 2 766 individuals hold 2 out of 527 shares included in the cluster. In the end of this section, the share contents and holding distributions of the two subclusters with largest PageRank from the ﬁrst top-level cluster are shown.

Table 6.9: Cluster 1. (a) Number of holders for the largest shares in cluster (b) Distribution of number of holdings for cluster 1 containing 23 924 individuals. 1 containing 23 924 individuals.

Holders Share Holdings Frequency Proportion 10 666 Share A 1 18 378 0.77 445 Share K 2 2 766 0.12 379 Share L 3 1 171 0.05 358 Share M 4 525 0.02 342 Share N 5 318 0.01 334 Share O 6 188 0.01 317 Share P 7 134 <0.01 306 Share Q 8 111 <0.01 304 Share R 9 56 <0.01 Shares: 527 Others 277 0.01

Table 6.10: Cluster 2. (a) Number of holders for the largest shares in cluster (b) Distribution of number of holdings for cluster 2 containing 19 379 individuals. 2 containing 19 379 individuals.

Holders Share Holdings Frequency Proportion 16 758 Share B 1 11 702 0.60 5 140 Share A 2 4 076 0.21 2 017 Share H 3 1 669 0.09 316 Share D 4 759 0.04 308 Share S 5 405 0.02 307 Share C 6 232 0.01 Shares: 482 Others 536 0.03

37 Table 6.11: Cluster 3. (a) Number of holders for the largest shares in cluster (b) Distribution of number of holdings for cluster 3 containing 15 290 individuals. 3 containing 15 290 individuals.

Holders Share Holdings Frequency Proportion 14 705 Share C 1 6 338 0.41 5 106 Share J 2 4 358 0.29 2 967 Share B 3 1 661 0.11 2 249 Share A 4 941 0.06 1 577 Share D 5 549 0.04 1 166 Share H 6 373 0.02 489 Share E 7 274 0.02 475 Share U 8 176 0.01 451 Share W 9 158 0.01 441 Share V 10 103 <0.01 438 Share F 11 96 <0.01 Shares: 478 Others 263 0.02

Table 6.12: Cluster 4. (a) Number of holders for the largest shares in cluster (b) Distribution of number of holdings for cluster 4 containing 8016 individuals. 4 containing 8016 individuals.

Holders Share Holdings Frequency Proportion 7 778 Share D 1 3 985 0.50 1 190 Share A 2 2 289 0.29 1 077 Share V 3 748 0.09 899 Share B 4 335 0.04 340 Share H 5 223 0.03 239 Share U 6 139 0.02 200 Share P 7 85 0.01 Shares: 423 Others 221 0.02

Table 6.13: Cluster 5. (a) Number of holders for the largest shares in cluster (b) Distribution of number of holdings for cluster 5 containing 5643 individuals. 5 containing 5643 individuals.

Holders Share Holdings Frequency Proportion 5 450 Share F 1 894 0.16 2 419 Share A 2 1 122 0.20 1 083 Share B 3 935 0.17 872 Share D 4 667 0.12 680 Share G 5 521 0.09 476 Share U 6 375 0.07 428 Share Z 7 287 0.05 401 Share W 8 208 0.04 397 Share Y 9 141 0.02 380 Share I 10 115 0.02 Shares: 479 Others 378 0.07

38 Table 6.14: Cluster 6. (a) Number of holders for the largest shares in cluster (b) Distribution of number of holdings for cluster 6 containing 6171 individuals. 6 containing 6171 individuals.

Holders Share Holdings Frequency Proportion 4 982 Share E 1 576 0.09 2 643 Share F 2 617 0.10 2 489 Share A 3 616 0.10 1 991 Share I 4 504 0.08 1 491 Share G 5 437 0.07 1 399 Share Y 6 371 0.06 1 315 Share D 7 315 0.05 1 252 Share U 8 272 0.04 1 249 Share B 9 268 0.04 1 225 Share BC 10 212 0.03 1 087 Share C 11 185 0.03 1 082 Share W 12 159 0.03 1 049 Share R 13 151 0.02 Shares: 511 Others 1488 0.26

Table 6.15: Cluster 7. (a) Number of holders for the largest shares in cluster (b) Distribution of number of holdings for cluster 7 containing 4362 individuals. 7 containing 4362 individuals.

Holders Share Holdings Frequency Proportion 3 224 Share E 1 16 <0.01 2 771 Share CD 2 837 0.19 1 987 Share T 3 758 0.17 1 696 Share A 4 777 0.18 1 555 Share EF 5 541 0.12 1 072 Share B 6 353 0.08 896 Share D 7 263 0.06 864 Share G 8 185 0.04 511 Share FG 9 132 0.03 433 Share GH 10 126 0.03 415 Share H 11 76 0.02 Shares: 423 Others 298 0.08

39 6.5.1 Example of subclusters The share content and the holding distribution of the two subclusters with largest PageRank from the first top-level cluster in the binary approach are shown in Table 6.16 and 6.17. The first subcluster contains 9908 individuals and the second subcluster contains 263 individuals. In total the first top-level cluster contains 357 subclusters.

Table 6.16: Subcluster 1.

(a) Number of holders for the largest shares in sub- (b) Distribution of number of holdings for sub- cluster1 from the ﬁrst top-level cluster in the binary cluster 1 from the ﬁrst top-level cluster in the bi- approach. nary approach.

Holders Share Holdings Frequency Proportion 9908 Share A 1 9759 0.98 7 Share HI 2 134 0.01 5 Share IJ 3 14 <0.01 5 Share JK 4 1 <0.01 Shares: 114 Others 9908 1.00

Table 6.17: Subcluster 2.

(a) Number of holders for the largest shares in sub- (b) Distribution of number of holdings for subcluster 2 from the ﬁrst top-level cluster in the binary cluster 2 from the ﬁrst top-level cluster in the bi- approach. nary approach.

Holders Share Holdings Frequency Proportion 263 Share KL 1 226 0.86 3 Share LM 2 19 0.07 3 Share N 3 13 0.05 3 Share NO 4 4 0.02 2 Share PQ 5 1 <0.01 Shares: 46 Others 263 1.00

40 6.6 Changes between dates

In this section the result when looking at the binary approach for date 2011-06-30 is presented along with a comparison to the clustering for date 2011-12-30. Notice that the 100 000 randomly chosen individuals are chosen at the second date, and that 98 084 of these are present at the ﬁrst date, making the network around two percent smaller.

6.6.1 Binary approach date 2011-06-30 A summary of the results when using the binary ownership approach at date 2011-06- 30 can be seen in Table 6.18. The following network statistics were obtained: • Links: 1 775 491. • Gain over two-level code: 0.70%. • C = 12 990. • Top-level clusters: 25.

Table 6.18: The result from the hierarchical clustering with InfoMap using the binary approach and 98 084 randomly chosen individuals for date 2011-06-30.

Cluster PageRank Individuals Subclusters 1 0.25 24 784 347 2 0.19 16 467 125 3 0.17 14 989 94 4 0.08 6951 53 5 0.06 8658 137 6 0.04 4229 62 7 0.04 3812 43 Total: 0.83 79 890

6.6.2 Comparing clustering between dates The transition in the classiﬁcations of nodes between the two dates can be seen in Table 6.19. A visualization of the structural transitions in the clustering ﬂow between the dates can be seen in Figure 6.3. Note that the naming of clusters is done only according to the clusters PageRank at respective dates.

41 Table 6.19: Clustering changes of individuals between dates 2011-06-30 and 2011-12- 30 for the binary approach. The table shows the cluster classification at the first date in the first row From. The second row To shows the cluster at the second date where most of the individuals from the cluster above at the first date have transfered . The last row Holders shows the amount of holders that follow the given transition between the dates.

From 1 2 3 4 5 6 7 Others To 1 2 3 4 6 5 2 Others Holders 87% 97% 93% 96% 41% 83% 51% 61%

Others 16% Others 15%

Cluster 7 4% Cluster 7 4%

Cluster 6 4% Cluster 6 5%

Cluster 5 6% Cluster 5 5%

Cluster 4 8% Cluster 4 9%

Cluster 3 17% Cluster 3 17%

Cluster 2 19% Cluster 2 22%

Cluster 1 25% Cluster 1 24%

2011-06-30 2011-12-30

Figure 6.3: A visualization of the structural ﬂow transitions in the clustering between dates 2011-06-30 and 2011-12-30 using the binary approach. The visualization has been made with the Map Generator software package [13].

42 Chapter 7

Discussion

This chapter provides a discussion of the results presented in the thesis. The chapter starts with a discussion of the clustering results and ends with a discussion of alternative approaches to analyze the dataset.

7.1 Clustering

The three main approaches show different results when analyzed, and in order to evaluate which one is the best, we have to check which approach that best meets the ob- jectives of the work. Since the goal of the clustering is that the results should be easily displayed and not be too excessive, we have chosen to look at the seven largest top- level clusters and primarily evaluate the approaches from the structure and information of these. As an overall result it can be good to notice that the relatively large structural impact of Share A and Share B is due to the fact that these are the largest shares in the data in terms of number of holders, as seen in Table 6.1b. Also notice that all analyzed approaches show a gain in the compression when comparing the hierarchical result to the result obtained from the ordinary two-level method. This is an indication that the data possess a hierarchical structure. When looking at the results presented in section 6.3, we can see that the binary approach includes almost 83% of the individuals and accounts for 86% of the PageRank, or the shared information in the network as we could interpret it. This outcome is much better relative to the other two approaches which both give considerably more top-level clusters and only account for less than 60% of the total PageRank in the seven top-level clusters. As a result, the typical cluster size at the top-level is also smaller compared to the binary case and only about half of the individuals in the network are included in the seven largest clusters. The results thus show that the binary method is the one that best meets the goals as it accounts both for most individuals and largest amount of the shared information in the network. For this reason, the binary approach is the method that is examined further.

7.1.1 Cut off The approaches with a cut off shows results similar to the ordinary case for all cases except the one obtained with approach 2 with percentage of private shares and a cut off at 10%. In this case the individuals are mainly divided into two clusters that ac-

43 count for almost all of the PageRank. The result is however not a satisfying clustering since a partition of almost all individuals into one cluster does not provide adequate information about the structure and the patterns in the ownership data. Despite this, an interesting feature of the result is that the subclusters of the largest top-level cluster also contains subclusters, making the hierarchical structure deeper.

7.1.2 Detailed examination of binary results

The binary approach showed the most satisfying results according to the clustering goals, and therefore the results of the binary approach were examined further. The results including the share content and the holding distribution for the seven top-level clusters from the binary approach can be seen in Tables 6.9 to 6.15 in section 6.5. The seven top-level clusters are discussed below:

• The first cluster of the binary approach is the largest cluster in terms of number of individuals and it primarily contains holders with holdings in only one share, where Share A is by far the most frequent. The first cluster includes possessions in 527 shares, and Share A is expected to be the share connecting the individuals in this cluster. • The second cluster contains many individuals that hold shares in Share B, but the holding distribution in this cluster is broader than the first, as less individuals have holdings in only one share. Share B seems to be the share mainly connecting the individuals, and many of these also hold shares in Share A and Share H. • The third cluster is represented by holdings in two shares from the same company, namely Share C and Share J, and over half of the individuals in the cluster have holdings in more than one share. • The fourth cluster is much smaller than the previous clusters in terms of individuals, and in this cluster two shares from the same company, Share D and Share V, among with Share A and Share B are mainly represented. About half of the individuals have holdings in only one share. • The fifth cluster shows a very broad distribution in the number of holdings with Share F being the most frequent share. This cluster mainly contains individuals with holdings in many shares. • The sixth cluster shows an even broader holding distribution with 26% of the individuals having holdings in more than 13 shares. For this reason a large amount of companies are represented in this cluster. • The seventh and last cluster also shows a relatively broad holding distribution, with two shares from the same company, Share E and Share DE and Share CD being the most represented shares.

The two subclusters from the first top-level cluster, presented in Tables 6.16 and 6.17, contain more information about the structure of the network. The first subcluster represents the individuals only holding shares in Share A and the second subcluster includes the individuals primarily holding shares in Share KL. The reason that the first subcluster belongs to the first top-level cluster is clear because of the shares of Share A it contains. The second subcluster most likely includes individuals with secondary links to individuals holding shares in Share A, and thus the subcluster is included in the first top-level cluster.

44 7.1.3 Changes between dates The fact that there are a larger number of individuals at the second date could have an effect when comparing the results between the dates. New individuals at the second date should however not have a very large impact unless these individuals hold shares in many different companies, creating many links. The clustering result from date 2011-06-30 shows a similar partition of the network as the result from date 2011-12-30. The seven largest top-level clusters account for 87% of the PageRank, compared to 86% for the later date, and at both dates around 83 000 individuals are included. A difference is however the PageRank of individual clusters, which differs for a majority of the seven clusters. Note however that the naming of clusters is done only according to the clusters PageRank at the respective dates and hence this should be taken into account before analyzing the PageRank values.

7.1.4 Comparing clustering between dates A visualization of the clustering at two dates and the structural ﬂow changes between the dates can be seen in Figure 6.3.Among with Table 6.19 with the cluster changes of individuals, some interesting features can be observed. Clusters 1, 2, 3 and 4 all seem to contain roughly the same individuals and the same amount of shared information, or PageRank, at both dates. Cluster 6 seem to be almost the same cluster as cluster 5 at the second date, while a large part of cluster 5 at the ﬁrst date passes to cluster 6, meaning that these clusters probably represent the same individuals. Cluster 7 is not at all sustainable between the dates, since it is divided primarily into cluster 2 and 3. At the second date cluster 7 is instead created from the individuals of the cluster Others. The reason for this behaviour of the smallest cluster is probably due to the fact that there is more uncertainty in the choice of the last cluster as the PageRank values do not differ so much for these. For cluster 1 it is possible to see that a part of the cluster, including 2 513 individuals, transfers to the cluster Others at the second date. An explanation to this behaviour can be obtained by looking at the holdings of the individuals within this group. Be- tween the two dates about 400 new holdings occur that create many new links, and as a result the individuals in the group are categorized in a different cluster.

7.2 Other suggested approaches

The approaches analyzed in this thesis are certainly not exhaustive and there is a number of different approaches that could be tested when creating the ownership network. The data could for instance be considered as a bipartite network, where a one-mode projection is performed by connecting owners that hold shares in the same company. A drawback of this approach is however that the one-mode projection approach disregards important information within the network. If the limitation of only looking at the dates with company price info were dropped, a possible approach could be to look at share changes between dates, and create a vector for each individual containing nonzero element for companies where the person changed his share amount. One drawback of this approach is however that companies may perform share splits between dates, increasing an individual’s share amount even though no shares have been purchased. Another approach could also be to look at the overall behaviour of individuals between dates and create a vector according to portfolio trend.

45 46 Chapter 8

Conclusions

In this thesis we have investigated the possibility of finding a way to summarize and cluster share ownership data from the Swedish stock market. We have looked at the holdings of individuals having a Swedish personal identification number and in order to analyze these we have used a network approach. In the network approach individuals have been considered as nodes, and links between nodes have been created in different ways by using the share portfolios of the individuals. The analysis of the networks have been performed with the community detection algorithm InfoMap. The algorithm turns the problem of finding the best cluster structure into the problem of optimally compressing the flow of information on the structure of the network. The results of the analysis indicate that it is possible to find significant patterns in the ownership data when looking at the holdings of individuals using a binary approach. By using the seven clusters with the largest information flow from this approach, a majority of the analyzed individuals can be categorized, and thus the results can be summarized without being too excessive. The clusters all accommodate for different properties regarding the portfolios of the included individuals, and one way of highlighting these properties is by looking at the overall holdings of the clusters. The clustering is performed for two different dates and the result shows that the binary approach seems to be sustainable over time, since most individuals are categorized into the same cluster when comparing the results at the two dates. By using alluvial diagrams, the clustering results can be visualized in a clear and accessible way and the visualization also helps to display changes that occur in the structure between dates. The next step in the analysis of the data would be to also analyze the clusters from a more economical perspective, interpreting the holding structure of each cluster and examine how they behave in relation to the stock market.

47 48 Bibliography

[1] Bolagsverket. http://www.bolagsverket.se. Accessed June 21, 2012.

[2] Fédération internationale de football association (ﬁfa). http//www.FIFA. com. Accessed June 21, 2012.

[3] Finansinspektionen. http://www.fi.se. Accessed June 21, 2012.

[4] SFS 2005:551. Aktiebolagslag. Justitiedepartementet.

[5] L.A.N. Amaral and J.M. Ottino. Complex networks: Augmenting the framework for the study of complex systems. European Physical Journal B, 38(2):147–162, 2004.

[6] J. Bernhardsson. Tradingguiden. Bokförlaget Fischer & Co, Stockholm, Sweden, "second" edition, 1996.

[7] V. Boginski, S. Butenko, and P.M. Pardalos. Mining market data: A network approach. Computers & Operations Research, 33(11):3171–3184, 2006.

[8] M. Buchanan and G. Caldarelli. A networked world. Physics World, 23:22–24, 2010.

[9] T. Carter. An introduction to information theory and entropy. Complex Systems Summer School, 2011.

[10] T.M. Cover and J.A Thomas. Elements of information theory. John Wiley & Sons, 2006.

[11] N. Du, B. Wang, and B. Wu. Community detection in complex networks. Journal of Computer Science and Technology, 23(4):672–683, 2008.

[12] N. Du, B. Wang, B. Wu, and Y. Wang. Overlapping community detection in bipartite networks. In Web Intelligence and Intelligent Agent Technology, volume 1, pages 176–179. IEEE, 2008.

[13] D. Edler and M. Rosvall. The map generator software package. http://www. mapequation.org. Accessed June 21, 2012.

[14] P. Erdos˝ and A. Rény. On random graphs. Publ. Math. Debrecen, 6:290–297, 1959.

[15] S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75–174, 2010.

49 [16] M. Girvan and M.E.J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002.

[17] I. Gribkovskaia, Ø. Halskau Sr, and G. Laporte. The bridges of Königsberg—a historical perspective. Networks, 49(3):199–203, 2007. [18] P.D. Grünwald, I.J. Myung, and M.A. Pitt. Advances in minimum description length: Theory and applications. MIT Press, 2005.

[19] D.A. Huffman. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9):1098–1101, 1952. [20] A. Lancichinetti and S. Fortunato. Community detection algorithms: A comparative analysis. Physical Review E, 80(5):056117, 2009.

[21] S. Lehmann, M. Schwartz, and L.K. Hansen. Biclique communities. Physical Review E, 78(1):016108, 2008. [22] M.E.J. Newman. The structure and function of networks. Computer Physics Communications, 147(1):40–45, 2002. [23] M.E.J. Newman. The structure and function of complex networks. SIAM review, pages 167–256, 2003. [24] M.E.J. Newman. Power laws, pareto distributions and zipf’s law. Contemporary physics, 46(5):323–351, 2005. [25] M.E.J. Newman. Networks: An Introduction. Oxford University Press, Oxford, UK, 2010. [26] M.E.J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical review E, 69(2):026113, 2004. [27] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.

[28] F. Pettersson, H. Helgesson, and F. Hård af Segerstad. 30 år med fonder. Fond- bolagens förening, 2009. [29] E. Ravasz and A.L. Barabási. Hierarchical organization in complex networks. Physical Review E, 67(2):026112, 2003.

[30] Sverges Riksbank. Den svenska ﬁnansmarknaden 2011. Printfabriken, Stock- holm, Sweden, 2011. [31] Sveriges Riksbank. Utvärdering av värdepappersavveckling i Sverige 2010. 2010. [32] M. Rosvall. Information horizons in a complex world. PhD thesis, 2006.

[33] M. Rosvall, D. Axelsson, and C.T. Bergstrom. The map equation. The European Physical Journal-Special Topics, 178(1):13–23, 2009. [34] M. Rosvall and C.T. Bergstrom. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, 105(4):1118, 2008.

50 [35] M. Rosvall and C.T. Bergstrom. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PloS one, 6(4):e18209, 2011.

[36] C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423 & 623–656, 1948. [37] Svenska statistiska centralbyrån. Aktieägarstatistik. Serie FM—Finansmarknad 20 SM 1201, Stockholm, Sweden, issn 1654-319x edition.

[38] A. Strehl, J. Ghosh, and R. Mooney. Impact of similarity measures on web-page clustering. In Workshop on Artiﬁcial Intelligence for Web Search (AAAI 2000), pages 58–64, 2000. [39] Euroclear Sweden AB. http://www.ncsd.eu/362_SVE_ST.htm. Ac- cessed June 21, 2012.

[40] D.J. Watts. Networks, dynamics, and the small-world phenomenon. American Journal of Sociology, 105(2):493–527, 1999. [41] D.J. Watts and S.H. Strogatz. Collective dynamics of small-world networks. Na- ture, 393(6684):440–442, 1998.

[42] B. Wilke. Aktie- och fondhandboken. Aktiespararna Kunskap, Stockholm, Swe- den, "ﬁrst" edition, 2011.