brain sciences

Article Massive Data Management and Sharing Module for Reconstruction

Jingbin Yuan 1 , Jing Zhang 1, Lijun Shen 2,*, Dandan Zhang 2, Wenhuan Yu 2 and Hua Han 3,4,5,* 1 School of Automation, Harbin University of Science and Technology, Harbin 150080, China; [email protected] (J.Y.); [email protected] (J.Z.) 2 Research Center for Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; [email protected] (D.Z.); [email protected] (W.Y.) 3 The National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China 4 Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China 5 The School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China * Correspondence: [email protected] (L.S.); [email protected] (H.H); Tel.: +86-010-82544385 (L.S.); +86-010-82544710 (H.H)  Received: 30 April 2020; Accepted: 20 May 2020; Published: 22 May 2020 

Abstract: Recently, with the rapid development of electron microscopy (EM) technology and the increasing demand of circuit reconstruction, the scale of reconstruction data grows significantly. This brings many challenges, one of which is how to effectively manage large-scale data so that researchers can mine valuable information. For this purpose, we developed a data management module equipped with two parts, a storage and retrieval module on the server-side and an image cache module on the client-side. On the server-side, Hadoop and HBase are introduced to resolve massive data storage and retrieval. The pyramid model is adopted to store electron microscope images, which represent multiresolution data of the image. A block storage method is proposed to store volume segmentation results. We design a spatial location-based retrieval method for fast obtaining images and segments by layers rapidly, which achieves a constant time complexity. On the client-side, a three-level image cache module is designed to reduce latency when acquiring data. Through theoretical analysis and practical tests, our tool shows excellent real-time performance when handling large-scale data. Additionally, the server-side can be used as a backend of other similar software or a public database to manage shared datasets, showing strong scalability.

Keywords: connectome; massive data management; distributed storage and retrieval; electron microscope image; segmentation result; image cache

1. Introduction The brain is the organ of thought, and as the most advanced part of the nervous system, it dominates all the activities of the body. To explore the brain, scientists have done much research and have a basic understanding of its functions. For example, MRI technology is adopted to study the connections between different gray matter areas of the brain on a macro scale (millimeter scale) [1]. Based on optical technology, the projection path of a single neuron axon and its connection with upstream and downstream are studied at the mesoscopic scale (micron scale) [2,3]. To ensure that all parts of the brain work together and drive the body’s activities, the transmission of signals, and the connection between the large number of neurons is still a mystery. The significant number of neurons and the complex connections between them make the structure and function of the brain extremely complicated. One of the ways to explore the mechanism is to use the electron microscope (EM) to image sequence

Brain Sci. 2020, 10, 314; doi:10.3390/brainsci10050314 www.mdpi.com/journal/brainsci Brain Sci. 2020, 10, 314 2 of 15 Brain Sci. 2020, 10, x FOR PEER REVIEW 2 of 15 slices(EM) of the to brainimage tissue sequence and slices then of draw the brain the connectometissue and then of draw it [4 –the6]. connectome It possesses of oneit [4–6]. advantage It possesses that the imagingone modeadvantage has enoughthat the highimaging spatial mode resolution has enough (nanoscale) high spatial to clearlyresolution distinguish (nanoscale) the to connections clearly concerningdistinguish the delicatethe connections structures concerning of the neurons, the delicate such structures as connections of the neurons, between such synapses. as connections Thebetween reconstruction synapses. of the connectome usually requires a series of steps such as sample preparation, The reconstruction of the connectome usually requires a series of steps such as sample slicing, EM imaging, image registration, automatic segmentation, and proofreading. A schematic preparation, slicing, EM imaging, image registration, automatic segmentation, and proofreading. A diagram of obtaining the sequence images by the electron microscope from a brain tissue block is schematic diagram of obtaining the sequence images by the electron microscope from a brain tissue shownblock in Figure is shown1a,b. in Figure 1a,b.

(a) (b) (c) (d) FigureFigure 1. A 1. schematic A schematic diagram diagram of of the the sequential sequential electronelectron microscope microscope images images and and the thesegmentation segmentation data thatdata makesthat makes up aup neuron. a neuron. (a) (a A) A schematic schematic diagram diagram of the the brain brain tissue tissue block; block; (b) (Sequentialb) Sequential electron electron microscopemicroscope (EM) (EM) images images obtained obtained from from the the brain brain tissuetissue block; block; (c ()c Different) Different colors colors represent represent different different segments.segments. Each Each segment segment represents represents a segmentationa segmentation region of of a a neuron neuron in inthe the layer; layer; (d) (Thed) The segments segments on each layer form a neuron. on each layer form a neuron.

Table 1. Reconstruction results over the years With the development of advanced technology and the expectation of reconstructing complete connectome,Year the volume ofSample electron microscope Volume images (um grows3) significantly.Reconstruction The State difficulty of EM images management2011 Visual is increasing. cortex of mice For [7] one, to ensure 120 × the80 × continuity130 between Sparse sequence 1 images and the limitation2013 of the Mouse actual inner physical reticulum size [8] of the neural 132 × structure,114 × 80 the thickness Sparse of the slice should be thinner and2015 is generally Rat 30–50cortex/ nm or [9] even 5 44 nm. × 60 For× 141/93 another, × 60 × in93 order toSparse observe a more delicate 2016 Rat visual thalamus [10] 400 × 600 × 280 Sparse structure, the spatial resolution of imaging is also required to be higher. With the development of EM 2017 Songbird X Area [11] 98 × 96 × 115 Sparse imaging technology,2018 it Drosophila is possible brain to fulfill[12] these 750 requirements, × 369 × 286 such as the serial Sparse block-face scanning electron microscopy2018 (SBF-SEM) Songbird X [ 7Area], the [13] focused-ion-beam 98 × 96 scanning × 115 electron microscopy Dense 2 (FIB-SEM) [8,9], the automated2019 tape-collecting Drosophila ultramicrotome brain [14] scanning 995 × 537 electron× 283 microscopy Dense (ATUM-SEM) [10,11], TEMCA [112 The], sparse and the state Zeiss means MSEMto reconstruct [13,14 a few]. neurons As a consequence, of interest.2 The dense the imaging state means speed to reconstruct is significantly improvedmost and of more the neurons image in a data tissue can block. be obtained at an acceptable time. Thus, the volume of the image dataWith obtained the development by the EM of advanced becomes technology more massive and the and expectation can be up of toreconstructing terabytes (TB) complete and even petabytesconnectome, (PB). Table the 1volume shows of the electron reconstruction microscope results images over grows the years.significantly. The volume The difficulty of reconstruction of EM is increasingimages management year by year, is increasing. and the state For one, of reconstruction to ensure the continuity gradually between changes sequence from sparse images toand dense. Therefore,the limitation large-scale of the image actual datasets physical will size beof morethe neural common structure, in the the future, thickness with of thethe excellentslice should value be for understandingthinner and the is generally neuronal 30–50 connection nm or andeven the5 nm. mechanism For another, of signalin order transmission to observe a inmore our delicate brain. structure, the spatial resolution of imaging is also required to be higher. With the development of EM imaging technology, it is possibleTable 1. toReconstruction fulfill these requ resultsirements, over such the years as the serial block-face scanning electron microscopy (SBF-SEM) [15], the focused-ion-beam scanning electron microscopy (FIB-SEM) [16,17],Year the automated Sampletape-collecting ultramicrotomeVolume scanning (um3 )electron microscopyReconstruction (ATUM-SEM) State [18,19],2011 TEMCA Visual [20], cortex and ofthe mice Zeiss [15 ]MSEM [21,22]. 120 As80 a 130consequence, the imagingSparse 1speed is × × significantly2013 improved Mouse inner and reticulum more image [16 ]data can be 132 obtained114 at 80an acceptable time. Thus,Sparse the volume × × of 2015the image data Rat obtained cortex/retina by the [17 EM] becomes 44 mo60re massive141/93 and60 can93 be up to terabytesSparse (TB) and × × × × even2016 petabytes Rat (PB). visual Table thalamus 1 shows [18] the reconstruc 400 tion600 results280 over the years. TheSparse volume of × × reconstruction2017 is Songbird increasing X Areayear [19by] year, and the 98state96 of reconstruction115 graduallySparse changes from × × sparse2018 to dense. DrosophilaTherefore, brainlarge-scale [20] image datasets 750 will369 be more286 common in the future,Sparse with the × × 2018 Songbird X Area [21] 98 96 115 Dense 2 × × 2019 Drosophila brain [22] 995 537 283 Dense × × 1 The sparse state means to reconstruct a few neurons of interest. 2 The dense state means to reconstruct most of the neurons in a tissue block. Brain Sci. 2020, 10, 314 3 of 15

The large-scale EM images and the application of the automatic segmentation algorithm with high accuracy often bring a large number of reconstruction results (usually the segmentation data), such as Convolutional Neural Networks (CNN) [23,24]. The difficulty of managing segmentation results lies in its number and irregular shape. According to the physical size of neurons and fractal theory, the number of the segments (as shown in Figure1c) in a volume of 1 mm 3 brain tissue, maybe billions or even more. Irregular shape makes storage and retrieval difficult. Therefore, managing such big data during the reconstruction process to explore the mechanism of our brain is still a challenge to be faced. For this purpose, it is necessary to develop a big data management tool for researchers.

2. Related Work In the field of , many excellent tools have been developed recently, such as CATMAID [25,26], which is inspired by online mapping applications. Knossos [15] and webKnossos [27] is for 3D image visualization and annotation. TrakEM2 [28] is an image-based program for morphological data mining and 3D modeling. EyeWire [29,30] is a game to crowdsource . Mojo and Dojo [31] are the interactive proofreading tool for connectomics. Vast [32] is a volume annotation and segmentation tool. NeuTu [33] with a distributed versioned image-oriented data service (DVID) [34] is for collaborative, large-scale, segmentation-based connectome reconstruction. Here, a management module is designed specifically for managing the large-scale data of neural structure in the field of connectomics, so that researchers can mine valuable information during the proofreading process. Firstly, the most critical and essential problem is the large-scale volume of data. In this paper, a distributed architecture with Hadoop [35] (Hadoop is an open-source distributed framework under the Apache foundation, which allows for the distributed processing of large datasets across clusters of computers using simple programming models. When the existing service cluster cannot meet the storage requirements, new server nodes can be added to the server cluster seamlessly and keep stable.) and HBase [36] (HBase is an open-source distributed database under the Apache foundation, which is the Hadoop database, a versioned, scalable, nonrelational database modeled for storing big data. It can achieve random, real-time read-write access.) is adopted on the server-side as the underlying support for large-scale data, it can be easily deployed in clusters and is scalable. Secondly, a standard interface is designed to encapsulate the underlying structure. In this case, the client can interact conveniently with the data in the distributed framework. Based on the standard interface, the server-side of the module can be used as the back end of other similar software based on client/server (C/S) architecture or a public database to manage shared datasets. Thirdly, for rapidly retrieving the required data, combined with the characteristics of HBase, different retrieval methods are designed for EM images and segmentation data. Finally, to further reduce the time delay of data acquisition in large-volume data, a three-level cache module is put forward on the client-side, combined with the characteristics of data. The paper only introduces the leading strategies and methods of the data management module. For specific implementation details and demos, please refer to “AppendixA”.

3. Materials and Methods In order to support the storage of large-volume data and meet the requirements for browsing at the client-side, this section introduces the storage and retrieval strategies of the EM image and segmentation data at the server-side in detail. At the client-side, a three-level image cache module is designed to reduce delay time when acquiring data.

3.1. Image Data Model Design In order to understand the brain more comprehensively, with the improvement of advanced technology, researchers hope to reconstruct larger brain tissue blocks or even the whole brain. As the number of brain tissue slices and the imaging resolution of the electron microscope increases, the volume of data grows massive as well. Terabyte (TB)- and even petabyte (PB)-level EM image data will become more common. However, the resolution of the users’ monitor is not so large Brain Sci. 2020, 10, 314 4 of 15

(for example, 1920 1080 pixels). When browsing images, the size of the image to be displayed is × usually determined by monitor resolution. Therefore, it is only necessary to get an image of the same size as the monitor resolution instead of the whole image. If the entire image is read into the computer’s memory for display, it will be time-consuming, and as the image size increases, the memory size becomes a limitation. Therefore, in this paper, the image is divided into fixed-size blocks, as shown in Figure2a,b. For the sake of the needs of observing neurons at different scales, such as observing the fine structure of neurons at high resolution and browsing the overall trend at low resolution, a multiresolution model is used to store EM images, called the image pyramid model, as shown in Figure2d,e. The pyramid model, as shown in Figure2d is generated by using the downsampling method, Method I, in Figure2c. The pyramid model, as shown in Figure2e, is generated by using the downsampling method, Method II in Figure2c. As for the downsampling method in Figure2c, after downsampling, the number of the image blocks is the same, but the image block size is reduced by four times than before. As for the downsampling method in Figure2c, after downsampling, the image size remains unchanged, but the number of images is reduced. The number of image blocks needed to be acquired in Method II is less than that in Method I when acquiring low-resolution images, and it is significantly reduced.

Therefore, the downsamplingBrain Sci. 2020, 10, x FOR MethodPEER REVIEW II is chosen to generate the image pyramid model.4 of 15 The maximum downsamplingnumber level of is brain calculated tissue slices as and follows: the imaging resolution of the electron microscope increases, the volume of data grows massive as well. Terabyte (TB)- and even petabyte (PB)-level EM image data will become more common. However,maxLevel the resolution= log of2(max(rows, the users’ monitor cols)) is not so, large (for example, 1920 (1) × 1080 pixels). When browsing images, the dsize of the image to be displayede is usually determined by monitor resolution. Therefore, it is only necessary to get an image of the same size as the monitor where max(a, b)resolution indicates instead taking of the whole the maximum image. If the enti valuere image of ais andread into b; rows the computer’s and cols memory means for total rows and columns of thedisplay, image it block.will be time-consuming, and as the image size increases, the memory size becomes a limitation.

(a) (b) (c)

(d) (e)

Figure 2. The designed image data model. (a) Suppose that the image in the viewport (the red box in Figure. 2 The designed image data model. (a) Suppose that the image in the viewport (the red box in the figure) is needed.the figure) Before is needed. the Before image the image is divided is divided into into blocks, blocks, the whole the image whole needs image to be obtained. needs to be obtained. (b) After the image(b) After is the divided, image is divided, only only six six small small images images need needto be acquired; to be the acquired; block size is the fixed, block the size is fixed, undersized image blocks are filled with the black area. (c) Two downsampling methods: Method I: the undersizedmipmap image [37], blocks before are and filledafter downsampling, with the black the nu area.mber of ( cthe) Twoimage downsamplingblocks is the same, but methods: the Method I: mipmap [37], beforeimage block and size after is reduced downsampling, by four times; Method the numberII: after downsampling, of the image the image blocks size remains is the same, but the image block sizeunchanged, is reduced but the number by four of images times; is reduced. Method (d) The II: image after pyramid downsampling, generated by Method the imageI used size remains to obtain images at different resolutions. (e) The image pyramid generated by Method II; compared unchanged, butwith the Method number I, the of number images of image is reduced. blocks needed (d) The to beimage acquired pyramid in this method generated is less when by Method I used to obtain imagesacquiring at di fflow-resolutionerent resolutions. (high-level) images. (e) The image pyramid generated by Method II; compared

with Method I,Therefore, the number in this of paper, image the blocks image is needed divided tointo be fixed-size acquired blocks, in thisas shown method in Figure is less 2a,b. when For acquiring low-resolutionthe sake (high-level) of the needs images. of observing neurons at different scales, such as observing the fine structure of neurons at high resolution and browsing the overall trend at low resolution, a multiresolution model is used to store EM images, called the image pyramid model, as shown in Figure 2d,e. The pyramid model, as shown in Figure 2d is generated by using the downsampling method, Method I, in Figure 2c. The pyramid model, as shown in Figure 2e, is generated by using the downsampling method, Method II in Figure 2c. As for the downsampling method in Figure 2c, after downsampling, the number of the image blocks is the same, but the image block size is reduced by four times than before. As for the downsampling method in Figure 2c, after downsampling, the image size remains unchanged, but the Brain Sci. 2020, 10, x FOR PEER REVIEW 5 of 15 number of images is reduced. The number of image blocks needed to be acquired in Method II is less than that in Method I when acquiring low-resolution images, and it is significantly reduced. Therefore, the downsampling Method II is chosen to generate the image pyramid model. The maximum downsampling level is calculated as follows:

maxLevel = ⌈log2(max(rows, cols))⌈, (1) where max(a, b) indicates taking the maximum value of a and b; rows and cols means total rows and columns of the image block. Brain Sci. 2020, 10, 314 5 of 15

3.2 Image Data Retrieval 3.2. Image Data Retrieval To obtain the image data to be displayed, the image blocks need to be indexed in the image pyramid.To obtain The key-value the image pair data index to be is displayed,established thefor imageretrieval blocks according need to to the be indexedspatial position in the imageof the pyramid.image block The and key-value the level of pair the image index pyramid. is established In addi fortion retrieval to that, HBase according is used to as the a database spatial position to store ofimages, the image and blockthe key and of the the level index of the is imagethe row pyramid. key of In the addition data toin that,the HBasedatabase, is usednamed as a“layer_row_col_level”; database to store images, the value and the of the key index of the is index the image is the block row keydata of at the the datacorresponding in the database, position. named For “layer_row_col_level”;example, as shown in Figure the value 3, suppose of the “w” index and is “h” the represent image block the width data at and the height corresponding of the image position. block, For(x1, y1, example, w1, h1) as represents shown in the Figure viewport3, suppose which “w”is displayed and “h” at represent scale “magnitude” the width in and layer height “layer”. of theThe imagetransformation block, (x1, between y1, w1, magnitude h1) represents and thelevel viewport is shown which in Table is displayed2. The start-row, at scale end-row, “magnitude” start-column, in layer “layer”.and end-column The transformation of the image between blocks that magnitude need to andbe read level are is showncalculated in Table as follows:2. The start-row, end-row, start-column, and end-column of the imagestart-row blocks = that⌈y1/h need⌈ = tor1, be read are calculated as follows: (2)

start-row = y1/h = r1, (2) end-row = ⌈(y1+b h1)/hc ⌈ = r2, (3) end-row = (y1+ h1)/h = r2, (3) start-columnb = ⌈x1/wc⌈ = c1, (4) start-column = x1/w = c1, (4) end-column = ⌈(x1+b w1)/wc ⌈ = c2, (5) end-column = (x1+ w1)/w = c2, (5) b c rowrow ∈[r1, [r1, r2], r2], (6) ∈ colcol ∈[c1, [c1, c2], c2], (7) ∈ Then through permutationpermutation and combination, the the indexes of of image blocks can be composed as “layer_row_col_level”.“layer_row_col_level”. As the image volume increase increases,s, the retrieval speed will not be affected affected by the retrieval strategy. The The method method of of key-value key-value pair pair re retrievaltrieval achieves achieves constant constant time time complexity complexity O(1). O(1).

Figure 3. TheThe flow flow of of image image retrieval. retrieval. w and w and h represent h represent the width the width and height and height of the ofimage the imageblock, which block, whichhas a fixed has a size, fixed respectively. size, respectively. (x1, y1) (x1, is y1)the isupper the upper left coordinate left coordinate and size and of size the of viewport the viewport (the (thered box red boxin the in figure); the figure); w1 and w1 h1 and are h1its sizes. are its w, sizes. h, and w, (x h,1, y1, and w1, (x1, h1) y1, are w1, employed h1) are to employed calculate torow calculate and column row andnumbers column of image numbers blocks of image (row, blockscol). Magnitude (row, col). is Magnitude the scale of is the the image, scale ofand the it image,is used and to calculate it is used the to calculatelevel of the the image level pyramid. of the image The layer pyramid. is the number The layer of isthe the sequence number image. of the The sequence level, row, image. col, Theand level,layer row,construct col, andthe indexes layer construct of image the blocks indexes for retrieval. of image blocks for retrieval. Table 2. The transformation between magnitude and level.

Magnitude Level mag 1 > 50% 0 50% mag > 25% 1 ≥ 25% mag > 12.5% 2 ≥ 12.5% mag > 6.25% 3 ≥ 2-n+1 mag > 2-n n-1 ≥ 2-n mag n (if n is the maximum level.) ≥ 1 Abbreviation of magnitude. Brain Sci. 2020, 10, x FOR PEER REVIEW 6 of 15

Table 2. The transformation between magnitude and level

Magnitude Level mag 1 > 50% 0 50% ≥ mag > 25% 1 25% ≥ mag > 12.5% 2 12.5% ≥ mag > 6.25% 3 2-n+1 ≥ mag > 2-n n-1 -n Brain Sci. 2020, 10, 314 2 ≥ mag n (if n is the maximum level.) 6 of 15 1 Abbreviation of magnitude.

3.3.3.3 Image Image Cache Cache In order order to to reduce reduce the the delay delay in inacquiring acquiring images images at the at client-side, the client-side, and give and users give usersa smoother a smoother image imagebrowsing browsing experience, experience, it is necessary it is necessary to preload to preload the image the imagenear the near current the current browsing browsing view of view the client. of the client.Based on Based our onimage our storage image storageand retrieval and retrieval methods methods as well as as requirements well as requirements for browsing for browsingEM images, EM a images,three-level a three-level cache module cache on module the client on is the designed client is here, designed as shown here, in as Figure shown 4a. in Figure4a.

(a) (b)

(c)

FigureFigure. 4.4 TheThe image image cache cache mechanism mechanism and and operation operation process. process. (a) The (a) architecture The architecture of the of image the image cache cachemodule. module. The image The imagecache cachefalls into falls three into threelevels. levels. (b) The (b )operation The operation process process of the of image the image cache. cache. The Therequested requested images images must must exist exist in the in remote the remote cache. cache. (c) The (c) schematic The schematic diagram diagram of image of image cache cache priority. priority. The Theblack black area arearepresents represents the current the current view viewwhen when browsing. browsing. In addition In addition to the black to the area, black the area, box the with box deeper with deepercolor is colorof high is priority, of high priority, whereas whereas on the contrary, on the contrary, the lighter the color lighter is of color low ispriority. of low priority.

As shown in Figure4 4c,c, whenwhen browsing,browsing, thethe clientclient generatesgenerates thethe indexesindexes ofof imagesimages requiredrequired byby thethe current viewport. Firstly, the client sends a request toto thethe memorymemory cache.cache. If the requested images exist in the memory memory cache, cache, the the images images are are directly directly returned returned to tothe the client; client; if not, if not, the therequest request is sent is sentto the to disk the diskcache. cache. Similarly, Similarly, if the data if the exists, data exists,the images the images are directly are directly returned returned and put andthe images put the into images the memory into the memorycache; otherwise, cache; otherwise, a request a is request sent to is the sent remote to the remotecache. The cache. client The gets client images gets images from the from remote the remote cache cacheand puts and the puts images the images into the into memory the memory and the and disk the cache. disk cache. During During browsing, browsing, the image the image far away far awayfrom from the current browsing viewport in the cache will be deleted to avoid occupying too much memory or disk space. It is beneficial to set up the cache priority of images, for firstly caching the images in the browsing direction near the current viewport. In Figure4b, suppose the black area is the currently displayed viewport of the client, and the user browses the image from left to right, the image located on the right side of the current area has a higher priority for caching, and the image closer to the current area also possesses higher priority. The cache area is a square, whose length is chosen by the longest side of the current monitor screen, as the smallest unit for caching. In addition to that, due to bandwidth Brain Sci. 2020, 10, 314 7 of 15 limitations, the cache area cannot be too large. The multithreading approach is taken to improve cacheBrain Sci. effi 2020ciency., 10, x FOR PEER REVIEW 7 of 15

3.4.the current Segmentation browsing Data viewport Model Design in the cache will be deleted to avoid occupying too much memory or disk Thespace. segmentation data from the EM image is obtained by human annotation or the automatic segmentationIt is beneficial algorithms. to set up Whether the cache the priority segmentation of images, data for represents firstly caching a neuron, the images an organelle in the browsing (such as mitochondria),direction near the or othercurrent cellular viewport structures. In Figure (such 4b, as suppose synapse), the the black essence area is is a the region currently (set of displayed points or aviewport point) with of the some client, attributes. and the Besides,user browses because the of image the considerable from left to numberright, the of image the segments, located on the the way right to storeside of and the retrieve current segmentation area has a higher data priority also needs for tocaching, be resolved. and the image closer to the current area also possessesIn this higher paper, priority. the contour The cache is used area to is represent a square, eachwhose segmentation length is chosen region by tothe save longest storage side space.of the Dicurrentfferent monitor cell structures screen, are as distinguished the smallest byunit the for type caching. attached In to addition the contour. to that, Additionally, due to bandwidth it is taken aslimitations, a unified the data cache type area called cannot primitives, be too large. and theThe correspondingmultithreading attributesapproach (suchis taken as to object improve id,object cache type,efficiency. region centroid/object node, color, connection relationship between nodes within the neuron, and synaptic connection between neurons) are added to it. Sometimes, it may be necessary to represent two3.4 Segmentation or more objects Data at Model the same Design point. For example, two cell structures have overlapping regions (suchThe as asegmentation mitochondrion data in from a neuron), the EM which image is is the obtained first advantage by human of usingannotation contours or the to automatic represent segmentationsegmentation dataalgorithms. instead ofWhether using voxels,the segmentation which are represented data represents by all a pixels neuron, of thean object.organelle The (such second as advantagemitochondria), is that or it other is helpful cellular for structures 2D display (such and 3Das rendering.synapse), the The essence contour is cana region be directly (set of used points for or the a displaypoint) with of plane some figures attributes. and 3DBeside rendering.s, because If it of is the represented considerable by voxels, number the of contour the segments, must be the extracted, way to orstore a model and retrieve must be segmentation built through da voxelsta also firstly needs before to be resolved. displaying or rendering. The third advantage is that theIn this contour-based paper, the storagecontour methodis used doesto represent not have each to store segmentation multiscale region data. Unliketo save EM storage image space. data, thisDifferent saves cell storage structures space are further. distinguished by the type attached to the contour. Additionally, it is taken as a unifiedA preprocessing data type called tool is primitives, provided forand extracting the corres primitivesponding attributes in large-scale (such segmentationas object id, object data type, and obtainingregion centroid/object the connection node, relationships color, connection between relationship primitives. Seebetween “Appendix nodesA ”within for specific the neuron, details. and synaptic connection between neurons) are added to it. Sometimes, it may be necessary to represent two Blockor more Storage objects at the same point. For example, two cell structures have overlapping regions (such as a mitochondrionWhen browsing in a neuron), primitives, which there is the is a first similar adva problemntage of with using browsing contours images. to represent Only thesegmentation primitives indata the instead current of viewport using voxels, of the which client are should represented be displayed. by all pixels The number of the object. of primitives The second is large, advantage and the is shapethat it isis irregular,helpful for which 2D display increases and the 3D di rendering.fficulty of spatialThe contour retrieval. can Itbe is directly an extremely used for time-consuming the display of processplane figures to traverse and 3D all rendering. primitives If to it getis represented the current by required voxels, primitives, the contour and must reading be extracted, all primitives or a model will alsomust be be limited built through by the memory voxels firstly size. Asbefore learning displayi fromng the or methodrendering. of image The third storage, advantage the primitives is that arethe storedcontour-based in blocks. storage All primitives method does on thenot correspondinghave to store multiscale image block data. are Unlike stored EM as image a separate data, primitivethis saves informationstorage space file. further. If a primitive crosses multiple image blocks (such as the primitive indicated by the red outlineA preprocessing in Figure5 atool is distributed is provided on for the extracting four image primitives blocks in in Figure large-scale5b), it willsegmentation be stored entirelydata and inobtaining each block. the connection relationships between primitives. See “Appendix A” for specific details.

(a) (b)

FigureFigure 5.5. The designed segmentation data model. ( a) The red line is the contour of thethe neuron,neuron, extractedextracted fromfrom segmentedsegmented data. ( b)) The contour in (a) isis acrossacross fourfour imageimage blocks,blocks, soso thethe wholewhole contourcontour datadata shouldshould be stored in each block. block.

Brain Sci. 2020, 10, 314 8 of 15

3.5. Segmentation Data Retrieval Due to the large number of primitives, compared with image data, the retrieval method of segmentation data is more complicated. When browsing segmentation data, two retrieval methods are needed, which are 2D retrieval for browsing the 2D view and 3D retrieval for getting a whole-cell structure, as shown in Figure6. The HBase is used to store primitives, and the row key of primitives in HBase is named as “layer_row_col_id”, according to the method of block storage previously described. The prefix of the index “layer_row_col” is the index of the image block. UUID is employed as the id of the index to ensure that the id of primitives is globally unique. Brain Sci. 2020, 10, x FOR PEER REVIEW 8 of 15

(a)

(b)

(c)

FigureFigure 6. The 6. The flow flow of segmentationof segmentation retrieval retrieval forfor 2D retrieval and and 3D 3D retrieval. retrieval. (a) The (a) Theflow flowof 2D of 2D retrieval.retrieval. The The meaning meaning of of parameters parameters (w, h, h, x1, x1, y1, y1, w1 w1,, h1, h1, layer) layer) is the is same the same as what as whatis mentioned is mentioned in in FigureFigure3. As3. As shown shown above, above, the the advantageadvantage of block block storage storage is isthat that fewer fewer primitives primitives should should be read, be read, especially for large-scale images, where tens of thousands of primitives may exist. Now, the primitives especially for large-scale images, where tens of thousands of primitives may exist. Now, the primitives in the current view need to be read. (b) The flow of 3D retrieval. A whole neuron consists of multiple in the current view need to be read. (b) The flow of 3D retrieval. A whole neuron consists of multiple contours in the sequence layer. A neuron has a unique id (object id in the figure), which is the parent id contours in the sequence layer. A neuron has a unique id (object id in the figure), which is the parent id of every contour. The whole neuron can be obtained with the neuron id and the single-column value of everyfilter contour. of HBase. The (c) HBase whole table neuron structure can befor obtainedstoring primitives with the. The neuron id in row id and keythe (layer_row_col_id) single-column is value filterthe of HBase.primitive’s (c) HBaseid, which table is unique structure in the for table. storing The primitives. object id is Thethe parent id in rowid of key the (layer_row_col_id)primitives that is thecompose primitive’s it. id, which is unique in the table. The object id is the parent id of the primitives that compose it. 3.4.1 Block Storage

When browsing primitives, there is a similar problem with browsing images. Only the primitives in the current viewport of the client should be displayed. The number of primitives is large, and the shape is irregular, which increases the difficulty of spatial retrieval. It is an extremely time-consuming process to traverse all primitives to get the current required primitives, and reading all primitives will also be limited by the memory size. As learning from the method of image storage, the primitives are stored in blocks. All primitives on the corresponding image block are stored as a separate primitive information file. If a primitive Brain Sci. 2020, 10, 314 9 of 15

3.5.1. 2D Retrieval As shown in Figure6a,c, when browsing primitives on the 2D view, the indexes (layer_row_col) of the image blocks intersecting with the client viewport are generated as the same method in section “Image data retrieval”. Because primitives do not need to be downsampled, there is no “level” item in the index, and then use the prefix filter of HBase to get all the primitives with the indexes as “layer_row_col_*” (symbol “*” stands for the primitive id). After that, the primitives stored in the corresponding blocks are obtained, with the duplicates being removed.

3.5.2. 3D Retrieval Three-dimensional (3D) objects are composed of several 2D primitives. When a whole-cell structure (such as a whole neuron) needs to be displayed, all the 2D primitives that make up the object needs to be acquired, as shown in Figure6b. It is an ine fficient way to traverse all the id in row key (in Figure6c) to get the needed primitives. In order to improve the retrieval e fficiency, the single-column value filter of HBase is adopted to get the primitives by using the object id. All the acquired 2D primitives are stacked in sequence to form a 3D object.

4. Results Based on the strategy proposed in Section3, we have developed a data management module and introduced its main architecture and features in this section. We also make a test on our data management module, and it shows excellent real-time performance when handling large-scale data.

4.1. Outline of the Data Management Module The architecture of the data management module is shown in Figure7. The data management module mainly includes a data storage and retrieval module at the server and an image cache module at the client. On the server-side, the underlying file system is HDFS (Hadoop distributed file system), and the database is HBase for storing EM images, segmentation data, and other intermediate results. Furthermore, several methods are proposed based on the data characteristics to retrieve the required data rapidly and accurately. A data interface is designed to exchange data between clients and servers. The image cache at the client is a three-level cache module, and the data at the server is considered as the third level (remote cache). By establishing first-level and second-level caches in the Brainmemory, Sci. 2020 and, 10 the, x FOR local PEER disk REVIEW of the client, the time delay for the client to obtain image data is reduced.10 of 15 The client and the server communicate with each other through the network communications engine client-sidegRPC (a high-performance, is written in C++ 11, open-source it has been universal thoroughly RPC tested framework) on Microsoft and protocolWindows bu (64ffers bit) (Google’s and can alsolanguage-neutral, run on Linux platform-neutral,with a simple modification. extensible mechanism for serializing structured data).

Figure 7. The architecture ofof thethe datadata management management module. module. The The data data management management module module consists consists of oftwo two parts: parts: an imagean image cache cache module module and aand data a storagedata storage and retrieval and retrieval module. module. The image The cache image module cache modulehas three has levels: three the levels: memory the memory cache, the cache, disk the cache, disk and cache, the remoteand thecache remote (the cache data (the at the data server at the is serverconsidered is considered as the remote as the cache). remote Data cache). is stored Data is in stored a distributed in a distributed file system file Hadoop, system Hadoop, with a database with a databaseHBase. The HBase. communication The communication tools between tools between clients and clients servers andare servers gRPC are and gRPC protocol and protocol buffers. buffers.

4.2. Large-scale Data Support, Scalability, and Data Security At present, managing large-scale neural structure data is one of the bottlenecks in reconstructing the connectome of the brain. The purpose that we develop this tool is to manage large-scale data effectively so that researchers can mine valuable information. The electron microscope image data of the full adult fly brain released by Google and Janelia Research Campus et al., after compression processing, is still 26 TB [14], which is only the EM images. Segmentation results and other intermediate data will also be generated during the reconstruction process. With the advancement of technology and the needs of scientific research, the volume of data will grow more significantly in the future (PBs or even EBs). To this end, the fully developed and widely used distributed file system Hadoop and distributed database HBase are employed to store and retrieve massive data. Another advantage of the distributed architecture built with Hadoop and HBase is that it supports the excellent scalability. At the same time, redundant copies of the data are kept in Hadoop clusters to prevent data from being lost or corrupted during storage or processing. The security and integrity of the data are guaranteed to a great extent.

4.3. Real-time Data Access Meeting the storage needs of large-scale data is only the first step in data management. How to retrieve the required data from a large-scale database and achieve the purpose of real-time access is also one of the problems that need to be solved. HBase integrates many convenient and fast retrieval interfaces, which can retrieve data in multiple ways. It uses key-value pairs for retrieving, storing, and accessing data in a columnar approach. Based on the nature of the data, combined with multiple retrieval interfaces of HBase and the sort mode, the purpose of fast retrieval is achieved by designing a reasonable key and a data access strategy for various data. On the premise of achieving fast retrieval, reading the data, and transmitting them from the server to the client also take some time. To reduce the delay time of data acquisition at the client, and finally achieve the purpose of real-time access, in this paper, a three-level cache module on the client- side was developed to cache EM images. When browsing the images’ area of interest, the image data (different scales and sequence images) near the area will be simultaneously cached. Additionally, the cache priority of the image data is set: both the data closer to the browsing area and that in the browsing direction has higher priority. Through such a caching strategy, the required data is cached for subsequent browsing, which makes the data acquisition more purposeful. At the same time, in order to reduce the time consumption of data transmission and save storage space, the JPEG compression technology is used to store and transmit data. The data is decompressed at the client to meet the needs for real-time data access further. Brain Sci. 2020, 10, 314 10 of 15

The data management module is used to manage large-scale neural structure data in the field of connectomics, mainly for EM images and its segmentation results. The data storage and retrieval module on the server-side of the management module are written in Java 8 and can run both on Microsoft Windows (64 bit) and Linux, because of the cross-platform feature of Java. For convenience use, the storage and retrieval module is packaged into a jar package. The image cache module on the client-side is written in C++ 11, it has been thoroughly tested on Microsoft Windows (64 bit) and can also run on Linux with a simple modification.

4.2. Large-Scale Data Support, Scalability, and Data Security At present, managing large-scale neural structure data is one of the bottlenecks in reconstructing the connectome of the brain. The purpose that we develop this tool is to manage large-scale data effectively so that researchers can mine valuable information. The electron microscope image data of the full adult fly brain released by Google and Janelia Research Campus et al., after compression processing, is still 26 TB [22], which is only the EM images. Segmentation results and other intermediate data will also be generated during the reconstruction process. With the advancement of technology and the needs of scientific research, the volume of data will grow more significantly in the future (PBs or even EBs). To this end, the fully developed and widely used distributed file system Hadoop and distributed database HBase are employed to store and retrieve massive data. Another advantage of the distributed architecture built with Hadoop and HBase is that it supports the excellent scalability. At the same time, redundant copies of the data are kept in Hadoop clusters to prevent data from being lost or corrupted during storage or processing. The security and integrity of the data are guaranteed to a great extent.

4.3. Real-Time Data Access Meeting the storage needs of large-scale data is only the first step in data management. How to retrieve the required data from a large-scale database and achieve the purpose of real-time access is also one of the problems that need to be solved. HBase integrates many convenient and fast retrieval interfaces, which can retrieve data in multiple ways. It uses key-value pairs for retrieving, storing, and accessing data in a columnar approach. Based on the nature of the data, combined with multiple retrieval interfaces of HBase and the sort mode, the purpose of fast retrieval is achieved by designing a reasonable key and a data access strategy for various data. On the premise of achieving fast retrieval, reading the data, and transmitting them from the server to the client also take some time. To reduce the delay time of data acquisition at the client, and finally achieve the purpose of real-time access, in this paper, a three-level cache module on the client-side was developed to cache EM images. When browsing the images’ area of interest, the image data (different scales and sequence images) near the area will be simultaneously cached. Additionally, the cache priority of the image data is set: both the data closer to the browsing area and that in the browsing direction has higher priority. Through such a caching strategy, the required data is cached for subsequent browsing, which makes the data acquisition more purposeful. At the same time, in order to reduce the time consumption of data transmission and save storage space, the JPEG compression technology is used to store and transmit data. The data is decompressed at the client to meet the needs for real-time data access further. Video S1 shows browsing EM images in real time and is available in “Supplementary Materials”, and please refer to “AppendixA” for specific information about the data in the video.

4.4. Data Sharing A standard interface is designed for the client to access data from the server, combined with gRPC and protocol buffers. Based on the interface, developers can design client applications and use the server-side of the module as a backend to manage massive data. By adding other public databases, the server-side of the module can manage the public databases in the field of connectomics, mainly Brain Sci. 2020, 10, x FOR PEER REVIEW 11 of 15

4.4. Data Sharing A standard interface is designed for the client to access data from the server, combined with gRPC and protocol buffers. Based on the interface, developers can design client applications and use Brainthe server-side Sci. 2020, 10, 314of the module as a backend to manage massive data. By adding other public databases,11 of 15 the server-side of the module can manage the public databases in the field of connectomics, mainly for EM images. Users acquire shared datasets through the standard interface. Users can also make for EM images. Users acquire shared datasets through the standard interface. Users can also make datasets by using our tools and store them in a public database or the distributed storage module datasets by using our tools and store them in a public database or the distributed storage module designed by us to share their data. For specific information on interface definition and designed by us to share their data. For specific information on interface definition and implementation, implementation, please refer to “Appendix A”. please refer to “AppendixA”.

4.5. Making Datasets and Testing Before using our tools, Hadoop and HBase need to be deployed on distributed cluster servers. A tool is providedprovided toto generategenerate the dataset, and it is integratedintegrated with the data storage and retrieval module. For For the the format format of of the the dataset, dataset, please please refe referr to tothe the section section “image “image data data model model design” design” in the in themethods. methods. A testing A testing dataset dataset was made was to made evaluate to evaluate the performance the performance of our tool. of our The tool. tool that The generates tool that generatesdatasets are datasets integrated are integratedinto the data into management the data management module with module a graphical with a user graphical interface user (GUI); interface for (GUI);specific for details specific and detailstutorial, and please tutorial, refer pleaseto “Appe referndix to A”. “Appendix The sequentialA”. The image sequential data used image to make data usedthe dataset to make has the50 layers dataset and has is 50the layers 8-bit grayscale and is the image. 8-bit The grayscale size of image. each layer The is size 470,000 of each × 425,000 layer ispixels,470,000 and it425,000 is stitched pixels by ,50 and × 50 it images is stitched whose by size 50 is 940050 images × 8500 whosepixels. The size resolution is 9400 of8500 the pixels.image × × × Theis 5 nm resolution × 5 nmof × the50 nm, image and is the5 nm size 5of nm the whole50 nm ,test and data the sizeis 2.35 of mm the whole× 2.125 test mm data × 2.5 is 2.35um. mmThe × × uncompressed2.125 mm 2.5serial um. image The is uncompressed featured with seriala data image volume is featuredof 9.08 TB. with Seven a data servers volume are configured of 9.08 TB. × × Sevenwith Linux servers systems are configured into a distribu withted Linux cluster. systems To prevent into a distributed data loss or cluster. damage, To preventthe number data of loss data or damage,redundancy the numberin Hadoop of data is redundancythree. It took in about Hadoop 165 is.9 three. h (6.9 It tookdays) about to generate 165.9 h (6.9 the days) dataset to generate on one thecomputer, dataset and on onemultiple computer, computers and multiple can be used computers to speed can up be the used process. to speed Finally, up the the process. total volume Finally, of the totaltest dataset volume is of 24.27 the testTB dataset(contains is multiresolution 24.27 TB (contains EM multiresolution images and three EM copies images in and Hadoop), three copies with inthe Hadoop), JPEG compression with the JPEG format. compression The performance format. of The our performance tool is tested of in our two tool ways. is tested One is in to two measure ways. Onethe time is to for measure acquiring the time the image for acquiring under thedifferent image browsing under diff modeserent browsing and calculate modes the and average calculate access the averagetime, while access the time,otherwhile is to test the the other concurrent is to test theaccess concurrent performance access of performance the tool. of the tool. The first is to test the average access time of image data (image block size is 2048 2048 pixels) in × differentThe browsingfirst is to test modes. the average The data access access time time of refersimage todata the (image time elapsed block size since is the2048 server × 2048 receives pixels) thein different index of browsing the image modes. block The and data performs access a time retrieval refers operation to the time until elapsed the image since the block server data receives is read intothe index the memory of the image of the server.block and The performs test results a retrieval are shown operation in Figure until8, and the the image unit block of the dataY-axis is read is in milliseconds.into the memory Figure of the8a showsserver. the The average test results access are time shown calculated in Figure by browsing8, and the data unit betweenof the Y-axis di fferent is in scalingmilliseconds. factors Figure of the 8a specified shows the layer. average Figure access8b shows time the calculated average by access browsing time calculated data between by browsing different discalingfferent factors areas of of the the specified layer under layer. the Figure specified 8b shows scaling thefactor. average Figure access8 ctime shows calculated the average by browsing access timedifferent calculated areas of by the browsing layer under the the diff specifiederent layers scalin ofg the factor. sequence Figure image 8c shows at a specifiedthe average scaling access factor. time Fromcalculated Figure by8, browsing it can be foundthe different that the layers most dataof the access sequence time isimage in 90 at ms, a specified and then scaling it can be factor. concluded From thatFigure when 8, it browsing can be found between that dithefferent most scalingdata access factors time at ais particular in 90 ms, layer,and then it takes it can more be concluded time to acquire that images;when browsing browsing between different different layers at scaling a particular factors scaling at a particular factor takes layer, less it time takes to more acquire time images. to acquire images; browsing different layers at a particular scaling factor takes less time to acquire images.

Figure 8. The average access time of image data. The unit of the Y-axis is in milliseconds. (A–C) represent the average access time for the client to obtain image data under different browsing situations. During testing, extreme cases of brute force browsing were ruled out. Brain Sci. 2020, 10, 314 12 of 15

The second is to test the concurrent access performance of the data management module when 50 people browse the same dataset at the same time. Under the gigabit bandwidth, the network transmission rate for each user at the client-side is about 2 MB/s, while the rate at the server-side reaches about 100 MB/s, which has reached the fastest transmission rate under the gigabit bandwidth. With the compressed data, there is no delay when users browse images, which completely meets the needs of users under normal browsing conditions. Besides, our data management method is also used for the study of the mitochondrion to decode brain activities [38] in recent research.

5. Discussion Mining valuable information from data in the field of connectomics will help us explore our brain. However, as the volume of tissue blocks to be reconstructed increases, the resolution of the electron microscope increases, and the thickness of sequence slices decreases, the volume of data grows more massive prevalently. Facing such large-scale data, providing useful management tools for researchers is the basis of this work. The purpose of developing this management module is to solve the problem of managing large-scale neural structure data, so as to help researchers to mine valuable information. Based on the standard application program interface (API), the server-side of the module can also be taken as the back end of other similar software or a public database to manage shared datasets. The storage models and retrieval methods are mainly designed for the EM images and segmentation results. With the distributed frameworks (Hadoop and HBase), while solving the storage and retrieval problems of such big data, the scalability of this module is also improved. As data increases, the storage pressure can be eased by adding server nodes, which is simple and easy to do. In addition to that, the redundant copies are conducive to ensuring the security of the data. At the same time, combined with the filters of HBase, a standard data interface and retrieval methods are designed to achieve the purpose of fast retrieval. The retrieval method has constant time complexity O(1). With the increase of data volume, the retrieval speed will not be affected. At the client-side, to reduce the delay time of data acquisition further, a three-level cache module is designed for EM images, which improves the performance of our software tools. With the fast retrieval method and the image cache module, our tool is featured with excellent real-time performance when accessing data. The design of the storage structure, the retrieval method, and the cache module are to cope with the increasing volume of data, for making our tool universal. Besides, it will not be affected by the volume of data. The universality of our tool is also shown in the application of other data, such as optical microscope data and MRI data. Of course, the tool will be refined. In the future, in order to further improve the user experience, reduce the pressure on server requests, and promote data sharing between users, the p2p network structure [39–41] is intended to apply to our tools. Moreover, expecting that through the continuous improvement of software tools, the reuse of data in the field of connectomics and sharing between users can be promoted so that the potential value of the data can be maximumly excavated. We also hope to make a contribution to the development of connectomics with the majority of researchers.

6. Conclusions The reconstruction of the connectome will help researchers to explore the brain. With the scale of data grows, which generated during the reconstruction process, managing the large volume data is a challenge. A massive data management tool is developed in this paper, based on a distributed architecture. Firstly, on the server-side, combining with Hadoop and HBase, many data management strategies are put forward to solve the problem of large-scale data storage and retrieval. Secondly, in order to further reduce the time delay of data acquisition, an image three-level cache module is proposed on the client-side. Then, a standard interface is designed for communicating between the server and the client. Finally, by making tests on the data management tools, it shows excellent real-time performance when handling large-scale data. Additionally, the server-side can be used as a backend of other similar software or a public database to manage shared datasets, showing strong Brain Sci. 2020, 10, 314 13 of 15 scalability. However, with the increasing demand of neuron circuit reconstruction, the pressure of data management grows in many ways, such as Increasing types of data to be managed, further improving the experience of users, making the concurrent processing ability of the server better and so on, so that will be the main direction of our future work.

Supplementary Materials: The following are available online at https://www.micro-visions.org/git/MiRA-Open/ DataManager/src/branch/master/videos. Video S1: Showing browsing EM images in real time. Author Contributions: Conceptualization, L.S., H.H., and J.Y.; software, investigation, data curation, L.S., D.Z., J.Y., and W.Y.; writing, review and editing, L.S., H.H., J.Y., and J.Z.; supervision, H.H., L.S. and J.Z.; funding acquisition, H.H. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by the National Science Foundation of China under grant number NO. 61673381, NO. 61201050, NO. 61306070, NO. 61701497, NO. 11771130, NO. 61871177, the Special Program of Beijing Municipal Science and Technology Commission under grant number NO. Z181100000118002, the Strategic Priority Research Program of the Chinese Academy of Science under grant number No. XDB32030200, and the Scientific Research Instrument and Equipment Development Project of the Chinese Academy of Sciences under grant number YZ201671. Acknowledgments: We thank Jie Hu for technical help. We extend our thanks to all investigators and students who participate in tool testing and give suggestions. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A The full source code of the data management module and a demo shown in MiRA (a software developed by ourselves) are available at https://www.micro-visions.org/git/MiRA-Open/DataManager. git, with a tutorial of how to use the module and generate datasets in README.md. The definition and implementation of the standard interfaces is available at https://www. micro-visions.org/git/MiRA-Open/DataManager/src/branch/master/DataManager/src/main/proto and https://www.micro-visions.org/git/MiRA-Open/DataManager/src/branch/master/DataManager/src/ main/java/org/casia/mira/server. For adding another data source in the field of connectomics, first of all, the public database should provide the application program interface (API) for getting data, better written in Java. Then, according to the usage of the API, modify the function “getImages” in https://www.micro-visions.org/git/MiRA-Open/DataManager/src/branch/master/DataManager/src/ main/java/org/casia/mira/server/ImageServerIO.java to add a new data source. The data shown in Video S1 is available at https://neurodata.io/data/kasthuri15/.

References

1. Jbabdi, S.; Sotiropoulos, S.N.; Haber, J.E.; Van Essen, D.C.; E Behrens, T. Measuring macroscopic brain connections in vivo. Nat. Neurosci. 2015, 18, 1546–1555. [CrossRef][PubMed] 2. Li, A.; Gong, H.; Zhang, B.; Wang, Q.; Yan, C.; Wu, J.; Liu, Q.; Zeng, S.; Luo, Q. Micro-optical sectioning tomography to obtain a high-resolution atlas of the mouse brain. Science 2010, 330, 1404–1408. [CrossRef] 3. Förster, D.; Maschio, M.D.; Laurell, E.; Baier, H. An optogenetic toolbox for unbiased discovery of functionally connected cells in neural circuits. Nat. Commun. 2017, 8, 116. [CrossRef][PubMed] 4. Wanner, A.; Genoud, C.; Masudi, T.; Siksou, L.; Friedrich, R.W. Dense EM-based reconstruction of the interglomerular projectome in the zebrafish olfactory bulb. Nat. Neurosci. 2016, 19, 816–825. [CrossRef] [PubMed] 5. Briggman, K.L.; Helmstaedter, M.; Denk, W. Wiring specificity in the direction-selectivity circuit of the retina. Nature 2011, 471, 183–188. [CrossRef] 6. Eichler, K.; Li, F.; Litwin-Kumar, A.; Park, Y.; Andrade, I.; Schneider-Mizell, C.M.; Saumweber, T.; Huser, A.; Eschbach, C.; Gerber, B.; et al. The complete connectome of a learning and memory centre in an insect brain. Nature 2017, 548, 175–182. [CrossRef] 7. Denk, W.; Horstmann, H. Serial block-face scanning electron microscopy to reconstruct three-dimensional tissue nanostructure. PLoS Biol. 2004, 2, e329. [CrossRef] Brain Sci. 2020, 10, 314 14 of 15

8. Knott, G.; Marchman, H.; Wall, D.; Lich, B. Serial section scanning electron microscopy of adult brain tissue using focused ion beam milling. J. Neurosci 2008, 28, 2959–2964. [CrossRef] 9. Hayworth, K.J.; Xu, C.S.; Lu, Z.; Knott, G.; Fetter, R.D.; Tapia, J.C.; Lichtman, J.W.; Hess, H.F. Ultrastructurally smooth thick partitioning and volume stitching for large-scale connectomics. Nat. Methods 2015, 12, 319–322. [CrossRef] 10. Hayworth, K.J.; Kasthuri, N.; Schalek, R.; Lichtman, J.W. Automating the collection of Ultrathin serial sections for large volume TEM reconstructions. Microsc. Microanal. 2006, 12, 86–87. [CrossRef] 11. Joesch, M.; Mankus, D.; Yamagata, M.; Shahbazi, A.; Schalek, R.; Suissa-Peleg, A.; Meister, M.; Lichtman, J.W.; Scheirer, W.J.; Sanes, J.R. Reconstruction of genetically identified neurons imaged by serialsection electron microscopy. eLife 2016, 5, e15015. [CrossRef] 12. Bock, D.D.; Lee, W.-C.A.; Kerlin, A.M.; Andermann, M.L.; Hood, G.; Wetzel, A.W.; Yurgenson, S.; Soucy, E.R.; Kim, H.S.; Reid, R.C. Network anatomy and in vivo physiology of visual cortical neurons. Nature 2011, 471, 177–182. [CrossRef] 13. Eberle, A.L.; Mikula, S.; Schalek, R.; Lichtman, J.; Tate, M.L.K.; Zeidler, D. high-throughput imaging with a multibeam scanning electron microscope. J. Microsc. 2015, 259, 114–120. [CrossRef][PubMed] 14. Eberle, A.L.; Zeidler, D. Multi-Beam Scanning Electron Microscopy for High-Throughput Imaging in Connectomics Research. Front. Neuroanat. 2018, 12, 112. [CrossRef][PubMed] 15. Helmstaedter, M.; Briggman, K.L.; Denk, W. High-accuracy neurite reconstruction for high-throughput neuroanatomy. Nat. Neurosci. 2011, 14, 1081–1088. [CrossRef] 16. Helmstaedter, M.; Briggman, K.; Turaga, S.; Jain, V.; Seung, H.S.; Denk, W. Correction: Corrigendum: Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature 2014, 514, 394. [CrossRef] 17. Berning, M.; Boergens, K.M.; Helmstaedter, M. SegEM: Efficient image analysis for high-Resolution connectomics. Neuron 2015, 87, 1193–1206. [CrossRef] 18. Morgan, J.L.; Berger, D.R.; Wetzel, A.W.; Lichtman, J.W. The fuzzy logic of network connectivity in mouse visual thalamus. Cell 2016, 165, 192–206. [CrossRef] 19. Dorkenwald, S.; Schubert, P.J.; Killinger, M.F.; Urban, G.; Mikula, S.; Svara, F.; Kornfeld, J.M. Automated synaptic connectivity inference for volume electron microscopy. Nat. Methods 2017, 14, 435–442. [CrossRef] [PubMed] 20. Zheng, Z.; Lauritzen, J.S.; Perlman, E.; Robinson, C.G.; Nichols, M.; Milkie, D.E.; Torrens, O.; Price, J.; Fisher, C.B.; Sharifi, N.; et al. A complete electron microscopy volume of the brain of adult Drosophila melanogaster. Cell 2018, 174, 730–743. [CrossRef][PubMed] 21. Januszewski, M.; Kornfeld, J.; Li, P.H.; Pope, A.; Blakely, T.; Lindsey, L.; Maitin-Shepard, J.; Tyka, M.; Denk, W.; Jain, V. High-precision automated reconstruction of neurons with flood-filling networks. Nat. Methods 2018, 15, 605–610. [CrossRef] 22. Li, P.H.; Lindsey, L.F.; Januszewski, M.; Tyka, M.; Maitin-Shepard, J.; Blakely, T.; Jain, V. Automated reconstruction of a serial-section EM Drosophila brain with flood-filling networks and local realignment. bioRxiv 2019. bioRxiv: 605634. [CrossRef] 23. Cicek, O.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In Proceedings of the 19th International Conference on Medical Image Computing and Computer Assisted Intervention, Athens, Greece, 17–21 October 2016; Springer: Berlin/Heidelberg, Germany, 2016. 24. Xiao, C.; Liu, J.; Chen, X.; Han, H.; Shu, C.; Xie, Q. Deep Contextual Residual Network for Electron Microscopy Image Segmentation in Connectomics. In Proceedings of the 2018 IEEE International Symposium on Biomedical Imaging (ISBI), Washington, DC, USA, 4–7 April 2018. 25. Schneider-Mizell, C.M.; Gerhard, S.; Longair, M.; Kazimiers, T.; Li, F.; Zwart, M.; Champion, A.; Midgley, F.M.; Fetter, R.D.; Saalfeld, S.; et al. Quantitative neuroanatomy for connectomics in Drosophila. eLife 2016, 5, e12059. [CrossRef][PubMed] 26. Saalfeld, S.; Cardona, A.; Hartenstein, V.; Tomancak, P. CATMAID: Collaborative annotation toolkit for massive amounts of image data. Bioinformatics 2009, 25, 1984–1986. [CrossRef] 27. Boergens, K.M.; Berning, M.; Bocklisch, T.; Bräunlein, D.; Drawitsch, F.; Frohnhofen, J.; Herold, T.; Otto, P.; Rzepka, N.; Werkmeister, T.; et al. webKnossos: Efficient online 3D data annotation for connectomics. Nat. Methods 2017, 14, 691–694. [CrossRef] Brain Sci. 2020, 10, 314 15 of 15

28. Cardona, A.; Saalfeld, S.; Schindelin, J.; Arganda-Carreras, I.; Preibisch, S.; Longair, M.; Tomancak, P.; Hartenstein, V.; Douglas, R.J. TrakEM2 Software for Neural Circuit Reconstruction. PLoS ONE 2012, 7, e38011. [CrossRef] 29. Marx, V. waves to the crowd. Nat. Methods 2013, 10, 1069–1074. [CrossRef] 30. Bae, J.A.; Mu, S.; Kim, J.S.; Turner, N.L.; Tartavull, I.; Kemnitz, N.; Jordan, C.S.; Norton, A.D.; Silversmith, W.; Prentki, R.; et al. Digital museum of retinal ganglion cells with dense anatomy and physiology. Cell 2018, 173, 1293–1306. [CrossRef] 31. Haehn, D.; Knowles-Barley, S.; Roberts, M.; Beyer, J.; Kasthuri, N.; Lichtman, J.W.; Pfister, H. Design and evaluation of interactive proofreading tools for connectomics. IEEE Trans. Vis. Comput. Graph. 2014, 20, 2466–2475. [CrossRef] 32. Berger, D.R.; Seung, H.S.; Lichtman, J.W. VAST (Volume Annotation and Segmentation Tool): Efficient Manual and Semi-Automatic Labeling of Large 3D Image Stacks. Front. Neural Circuits 2018, 12, 88. [CrossRef] 33. Zhao, T.; Olbris, D.J.; Yu, Y.; Plaza, S.M. NeuTu: Software for Collaborative, Large-Scale, Segmentation-Based Connectome Reconstruction. Front. Neural Circuits 2018, 12, 101. [CrossRef] 34. Katz, W.T.; Plaza, S.M. DVID: Distributed Versioned Image-Oriented Dataservice. Front. Neural Circuits. 2019, 13, 5. [CrossRef][PubMed] 35. Apache Hadoop. Available online: https://hadoop.apache.org/ (accessed on 30 April 2020). 36. Apache HBase–Apache HBase™ Home. Available online: https://hbase.apache.org/ (accessed on 30 April 2020). 37. Williams, L. Pyramidal parametrics. In Proceedings of the 10th Annual Conference on Computer Graphics and Interactive Techniques, Detroit, MI, USA, 25–29 July 1983; Association for Computing Machinery: New York, NY, USA, 1983. 38. Li, P.H.; Lindsey, L.F.; Januszewski, M.; Tyka, M.; Maitin-Shepard, J.; Blakely, T.; Jain, V. Brain activity regulates loose coupling between mitochondrial and cytosolic Ca2+ transients. Nat. Commun. 2019, 10, 5277. 39. Schollmeier, R. A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications. In Proceedings of the First International Conference on Peer-to-Peer Computing, Linkoping, Sweden, 27–29 August 2001. 40. Barkai, D. Technologies for sharing and collaborating on the Net. In Proceedings of the First International Conference on Peer-to-Peer Computing, Linkoping, Sweden, 27–29 August 2001. 41. Bandara, H.M.N.D.; Jayasumana, A.P. Collaborative applications over peer-to-peer systems–challenges and solutions. Peer-to-Peer Netw. Appl. 2013, 6, 257–276. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).