Supporting Information Modha and Singh SI Text Transpose of Themselves, Brain Regions with Different Acronyms but the Largest Previous Network
Total Page:16
File Type:pdf, Size:1020Kb
Supporting Information Modha and Singh SI Text transpose of themselves, brain regions with different acronyms but The Largest Previous Network. The largest previous network of the with the same full name, typographical errors, incorrect case, miss- macaque brain consists of 95 vertices and 2,402 edges [1], where the ing special characters, existence of brain regions without mapping network models brain regions as vertices and the presence of long dis- or connectivity relations, incorrect L and I relations, self-loops in L tance connections as directed edges between them. This network is relation, missing relations, and so on. displayed in Figure S1 and should be compared to our network in Fig- ure 1 of the main text. It can be seen that the largest previous network Constructing a Network and a Hierarchical Brain Map. Let us rep- completely misses corticosubcortical and subcortico-subcortical long resent mapping relations I, L, O, E, C and the connectivity rela- distance connections and has significant gaps even amongst cortico- tion CNZ as two-dimensional binary matrices. The rows as well as cortical long distance connections. Two-thirds of our data has never columns correspond to various brain regions. The (i; j)-th entry of even been previously compiled, let alone analyzed and understood. I relation is 1 if there exists an identity relation between the brain regions corresponding to row i and column j, and zero otherwise. CoCoMac Database. We briefly summarize the CoCoMac database, Similarly, for L, O, E, C matrices the (i; j)-th entry is 1 if there ex- and refer the reader to the original publications for more details ists the associated relation between the brain regions corresponding [2, 3, 4, 5, 6]. CoCoMac consists of three primary databases: lit- to row i and column j. For CNZ matrix, 1 denotes the presence of a erature, mapping, and connectivity. connection between brain regions of corresponding row and column. From a matrix-theoretic view point, merging a pair of brain regions • Each included literature study is assigned a unique identifier, i and j amounts to, for each of the six matrices, computing a logi- LitID. For example, the paper by Pandya and Seltzer in 1982 [7] cal OR of their respective rows and columns, updating the respective is referred to as PS82. rows and columns by the ORed value, and removing one each of the • A BrainMap is a parcellation or mapping scheme used in a par- rows and columns, thus reducing the numbers of rows and columns ticular study. A BrainMap is also referred to using LitID.A by one. At any time, only a binary (zero or non-zero) entry is main- particular study may define and use a new BrainMap or it may tained for each of the matrices. use a previously defined map by another study. Thus, there are We now give a brief description of our data processing pipeline, a fewer BrainMaps than LitIDs. series of manual and algorithmic data transformations that transform A BrainMap is a set of BrainRegions, where a brain region initial matrices of size 6,877 × 6,877, into compact final matrices of refers to cortical and subcortical subdivisions (area, region, nu- size 383 × 383. At the end, I, E, and C become identity matrices, L cleus, etc.) as well as to combinations of such subdivisions into becomes the adjacency matrix of our hierarchical brain map, and the sulci, gyri, and other large ensembles1. A brain region is uniquely final CNZ has 6,602 non-zero entries – the edges of our network. identified as a concatenation of the BrainMap that it belongs to In the initial matrices, the connectivity information is scattered and its Acronym, that is, as LitID-Acronym. For example, V1 across brain regions, literature studies, and parcellation schemes. in FV91 [8] is uniquely referred to as FV91-V1 whereas V1 in This information has to be pieced together because: RD96 [9] is uniquely referred to as RD96-V1. BrainRegion Further, various s are related to each other via six 1. Different studies performed at different times by different groups identity I sub-structure S logical mapping relations, namely, ( ), ( ), by using different techniques on different animals and at differ- supra-structure L overlapping structure O expanded ( ), and ( ), ent resolutions, inevitably lead to a wide variety of nomenclature lamina (E), and collapsed lamina (C). • and parcellation schemes. For example, connectivity information Connectivity database consists of a set of records representing for primary visual area is scattered across 42 different reports that BrainRegion directed long distance connections from a source study V1, 15 different reports that study brain region 17, and 6 BrainRegion to a target . Further, each connection is annotated different reports that study StriateC. by experimentally determined strength: 1 for weak/sparse, 2 for 2. Without aggregating connectivity information of equivalent brain moderate, 3 for strong/heavy, X for unspecified density, and 0 for regions, one would only see disconnected path fragments. This absence of tested projection. point is illustrated in Figure S2 that shows how establishing equiv- alences between identical brain regions and then merging them Statistics of Downloaded Databases. When downloading the is essential to uncovering reciprocal connections or to uncover a database, we have ignored the fields PDC, Ref.text, and Ref.fig. short circular loop. We now enumerate gross statistics of the downloaded database. 3. Merging of equivalent brain regions is essential for network anal- • 410 unique LitIDs (literature studies) ysis. For example, the shortest path between two regions captures • 379 unique BrainMaps (parcellation schemes) the degree of coupling between them. The longest shortest path in • 6,877 unique BrainRegions the network captures the extremum over all pairs, and is known as • 1,880 unique Acronyms the diameter. Without merging, the brain’s connectivity appears • 10,681 unique, directed connections from one BrainRegion to significantly sparser than it actually is, for example, the diam- another with density labels “1”, “2”, “3”, or “X” eter appears dramatically larger, namely, 13, than it actually is, • 13,498 unique records showing tested but not found connections namely, 6. between a pair of BrainRegions with density label “0” • 16,712 unique records interrelating BrainRegions to each other: Consequently, in order to obtain a complete and correct picture of 9,134 I relations, 3,185 L relations, 3,185 S relations, 662 O rela- the brain’s connectivity, it is imperative that brain regions that are tions, 272 E relations, and 272 C relations. identical are merged into a single region. We now formalize the key difficulty in merging. Define conflict The downloaded database had a number of errors that had to be cor- relation as the intersection, logical conjunction, of I and L. In other rected, for example, incorrect associations between acronyms and words, conflict happens when two regions are identified both via the brain regions, lack of acronym field in the connectivity database, the supra-structure relation L not being the symmetric transpose of sub- structure relation S, the relations I and O not being the symmetric 1CoCoMac refers to BrainRegion as BrainSite. 1 www.pnas.org/cgi/doi/10.1073/pnas.1008054107 Modha & Singh identity and superstructure relations. For example, the database con- children. In both of these cases, the resolution is not useful for con- tains statements: FEF and 8A are identical and that FEF is a super- nectivity studies, and is sacrificed for the sake of conciseness, and structure of 8A. The relations I and L are transitive [4]. So, if A is the regions are merged. Further, from the view point of connectiv- identical to B, and B is identical to C, then A is implied to be iden- ity analysis, the resulting parcellation has the pleasing property that tical to C. The space of all implied identity and superstructure rela- every leaf brain region has two-way connectivity. Throughout, we tions are captured by their respective transitive closures I+ and L+. preserved every single connectivity datum with care except that we In matrix-theoretic parlance: I+ = I _ I2 _ :::, where _ represents removed all direct self-loops and all indirect self-loops in the form OR or logical disjunction. It should be noted that I+ and L+ contain of connections between a brain region and its ascendants or descen- a combinatorial large number of entries. We now extend the defi- dants (where ascendants and descendants are defined with respect to nition of conflict relation to mean the intersection between I+ and the above hierarchical brain map). L+. Conflicts arising because of transitivity are far too numerous, For overlap relation O, note that there are 9,134 I relations and are inherently insidious, and are extremely difficult to track and to 3,185 L relations, but only 662 O relations in CoCoMac. The sym- eliminate (see Figure S3). metry implies that there actually only 331 O relations. The impact of We highlight an interesting sub-routine inspired by ORT[3, Ap- overlap is mitigated due to the following reasons, pendix E] that we have repeatedly used for merging. The idea is to + + 1. Thinking of I, L, O as binary set relations, overlap is a weaker computes a submatrix J of I such that intersection of J and L is relationship than subset which in turn is weaker than identity. As empty. Then, J is a conflict-free identity relation, and can be safely a result, 112 O relations are subsumed by I and 26 relations are used for merging. This is pictorially described in Figure S4 2. Our subsumed by L. goal is to derive a CNZ for network analysis, with emphasis being on 2.