Instituto De Pesquisas Tecnológicas Do Estado De São Paulo ANDERSON TADEU MILOCHI Grids De Dados: Implementação E Avaliaçã
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Comparison of Kernel and User Space File Systems
Comparison of kernel and user space file systems — Bachelor Thesis — Arbeitsbereich Wissenschaftliches Rechnen Fachbereich Informatik Fakultät für Mathematik, Informatik und Naturwissenschaften Universität Hamburg Vorgelegt von: Kira Isabel Duwe E-Mail-Adresse: [email protected] Matrikelnummer: 6225091 Studiengang: Informatik Erstgutachter: Professor Dr. Thomas Ludwig Zweitgutachter: Professor Dr. Norbert Ritter Betreuer: Michael Kuhn Hamburg, den 28. August 2014 Abstract A file system is part of the operating system and defines an interface between OS and the computer’s storage devices. It is used to control how the computer names, stores and basically organises the files and directories. Due to many different requirements, such as efficient usage of the storage, a grand variety of approaches arose. The most important ones are running in the kernel as this has been the only way for a long time. In 1994, developers came up with an idea which would allow mounting a file system in the user space. The FUSE (Filesystem in Userspace) project was started in 2004 and implemented in the Linux kernel by 2005. This provides the opportunity for a user to write an own file system without editing the kernel code and therefore avoid licence problems. Additionally, FUSE offers a stable library interface. It is originally implemented as a loadable kernel module. Due to its design, all operations have to pass through the kernel multiple times. The additional data transfer and the context switches are causing some overhead which will be analysed in this thesis. So, there will be a basic overview about on how exactly a file system operation takes place and which mount options for a FUSE-based system result in a better performance. -
Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage Pengfei Xuan Clemson University, [email protected]
Clemson University TigerPrints All Dissertations Dissertations December 2016 Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage Pengfei Xuan Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_dissertations Recommended Citation Xuan, Pengfei, "Accelerating Big Data Analytics on Traditional High-Performance Computing Systems Using Two-Level Storage" (2016). All Dissertations. 2318. https://tigerprints.clemson.edu/all_dissertations/2318 This Dissertation is brought to you for free and open access by the Dissertations at TigerPrints. It has been accepted for inclusion in All Dissertations by an authorized administrator of TigerPrints. For more information, please contact [email protected]. ACCELERATING BIG DATA ANALYTICS ON TRADITIONAL HIGH-PERFORMANCE COMPUTING SYSTEMS USING TWO-LEVEL STORAGE A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Computer Science by Pengfei Xuan December 2016 Accepted by: Dr. Feng Luo, Committee Chair Dr. Pradip Srimani Dr. Rong Ge Dr. Jim Martin Abstract High-performance Computing (HPC) clusters, which consist of a large number of compute nodes, have traditionally been widely employed in industry and academia to run diverse compute-intensive applications. In recent years, the revolution in data-driven science results in large volumes of data, often size in terabytes or petabytes, and makes data-intensive applications getting exponential growth. The data-intensive computing presents new challenges to HPC clusters due to the different workload characteristics and optimization objectives. One of those challenges is how to efficiently integrate software frameworks developed for big data analytics, such as Hadoop and Spark, with traditional HPC systems to support both data-intensive and compute-intensive workloads. -
Optimizing Local File Accesses for FUSE-Based Distributed Storage
Optimizing Local File Accesses for FUSE-Based Distributed Storage Shun Ishiguro∗ Jun Murakami∗ Yoshihiro Oyama∗z Osamu Tatebeyz ∗Department of Informatics, The University of Electro-Communications Email: fshun,[email protected], [email protected] yFaculty of Engineering, Information and Systems, University of Tsukuba Email: [email protected] zJapan Science and Technology Agency, CREST Abstract—Modern distributed file systems can store huge these communications between the kernel module and the amounts of information while retaining the benefits of high reli- userland daemon involve frequent memory copies and context ability and performance. Many of these systems are prototyped switches, they introduce significant runtime overhead. The with FUSE, a popular framework for implementing user-level file systems. Unfortunately, when these systems are mounted framework forces applications to access data in the mounted on a client that uses FUSE, they suffer from I/O overhead file system via the userland daemon, even when the data is caused by extra memory copies and context switches during stored locally and could be accessed directly. The memory local file access. Overhead imposed by FUSE on distributed copies also increase memory consumption because redundant file systems is not small and may significantly degrade the data is stored in different page cache. performance of data-intensive applications. In this paper, we propose a mechanism that achieves rapid local file access in In this paper, we propose a mechanism that allows appli- FUSE-based distributed file systems by reducing the number cations to access local storage directly via the FUSE kernel of memory copies and context switches. -
Design of Store-And-Forward Servers for Digital Media Distribution University of Amsterdam Master of Science in System and Network Engineering
Design of store-and-forward servers for digital media distribution University of Amsterdam Master of Science in System and Network Engineering Class of 2006-2007 Dani¨el S´anchez ([email protected]) 27th August 2007 Abstract Production of high quality digital media is increasing in both the commercial and academic world. This content needs to be distributed to end users on demand and efficiently. Initiatives like CineGrid [1] push the limit looking at the creation of content distribution centres connected through dedicated optical circuits. The research question of this project is the following: “What is the optimal architecture for the (CineGrid) storage systems that store and forward content files of a size of hundreds of GBs?” First I made an overview of the current situation. At the moment the Rembrandt cluster nodes [16] are used in the storage architecture. All data has to be transferred manually to the nodes via FTP. This is not preferred, because administration is difficult. Therefore a list of criteria is made for the new storage architecture. Important criteria are bandwidth (6.4 Gb/s) and space (31.2 TB a year and expandable). I made a comparison between open source distributed parallel file systems based on these criteria. Lustre and GlusterFS turned out to be the best of these file systems according to the criteria. After that I proposed two architectures which use these file systems. The first architecture contains only cluster nodes and the second architecture contains cluster nodes and a SAN. In the end it is recommended to install GlusterFS in the first architecture on the existing DAS- 3 nodes [15] with Ethernet as interconnect network.