WORKSHOP PROCEEDINGS First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters

Total Page:16

File Type:pdf, Size:1020Kb

WORKSHOP PROCEEDINGS First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-1) June 26th, 2004 Saint-Malo, France Ville de Saint-Malo – Service Communication – Photos : Manuel CLAUZIER Held in conjunction with 2004 ACM International Conference on Supercomputing (ICS ’04) WORKSHOP PROCEEDINGS First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters COSET-1 Clusters are not only the most widely used general high-performance computing platform for scientific computing but also according to recent results on the top500.org site, they have become the most dominant platform for high-performance computing today. While the cluster architecture is attractive with respect to price/performance there still exists a great potential for efficiency improvements at the software level. System software requires improvements to better exploit the cluster hardware resources. Programming environments need to be developed with both the cluster and human programmer efficiency in mind. Administrative processes need refinement both for efficiency and effectiveness when dealing with numerous cluster nodes. The goal of this one-day workshop is to bring together a diverse community of researchers and developers from industry and academia to facilitate the exchange of ideas and to discuss the difficulties and successes in this area. Furthermore, to discuss recent innovative results in the development of cluster based operating systems and programming environments as well as management tools for the administration of high-performance computing clusters. COSET-1 Workshop co-chairs Stephen L. Scott Oak Ridge National Laboratory P. O. Box 2008, Bldg. 5600, MS-6016 Oak Ridge, TN 37831-6016 email: [email protected] Christine A. Morin IRISA/INRIA Campus universitaire de Beaulieu 35042 Rennes cedex, France email: [email protected] Program Committee Ramamurthy Badrinath, HP, India Amnon Barak, Hebrew University, Israël Jean-Yves Berthou, EDF R&D, France Brett Bode, Ames Lab, USA Ron Brightwell, SNL, USA Emmanuel Cecchet, INRIA, France Toni Cortès, UPC, Spain Narayan Desai, ANL, USA Christian Engleman, ORNL, USA Graham Fagg, University of Tennessee, USA Paul Farrell, Kent State University, USA Andrzej Goscinski, Deakin University, Australia Liviu Iftode, Rutgers University, USA Chokchai Leangsuksun, Louisiana Tech University, USA Laurent Lefèvre, INRIA, France John Mugler, ORNL, USA Raymond Namyst, Université de Bordeaux 1, France Thomas Naughton, ORNL, USA Hong Ong, University of Portsmouth, UK Rolf Riesen, SNL, USA Michael Schoettner, University of Ulm, Germany Assaf Schuster, Technion, Israël COSET-1 Program 9:00-9:05 Opening 9:05-10:05 Session 1: Cluster Operating System Services Session chair: Christine Morin, INRIA Parallel File System for Networks of Windows Workstations Jose Maria Perez, Jesus Carretero, Felix Garcia, Jose Daniel Garcia, Alejandro Calderon, Universidad Carlos III de Madrid, Spain An application-oriented Communication System for Clusters of Workstations Thiago Robert C. Santos and Antonio Augusto Frohlich, LISHA, Federal University of Santa Catarina (UFSC), Brazil 10:05-10:35 Session 2: Application Management Session chair: Christine Morin, INRIA A first step toward autonomous clustered J2EE applications management Slim Ben Atallah, Daniel Hagimont, Sébastien Jean and Noël de Palma, INRIA Rhône-Alpes, France 10:35-11:00 Coffee break 11:00 - 12:30 Session 3: Highly Available Systems for Clusters Session chair: Stephen Scott, ORNL Highly Configurable Operating Systems for Ultrascale Systems Arthur B. Maccabe and Patrick G. Bridges, The University of New Mexico, USA Ron Brightwell and Rolf Riesen, Sandia National Laboratories, USA Trammell Hudson, Operating Systems Research, Inc., USA Cluster Operating System Support for Parallel Autonomic Computing A. Goscinski, J. Silcock, M. Hobbs, Deakin University, Australia Type-Safe Object Exchange Between Applications and a DSM kernel R. Goeckelmann, M. Schoettner, S. Frenz and P. Schulthess, University of Ulm, Germany 12:30-14:30 Lunch 14:30-16:00 Session 4: Cluster Single System Image Operating Systems Session chair: Christine Morin, INRIA SGI's Altix 3700, a 512p SSI system. Architecture and Software environment. Jean-Pierre Panziera, SGI OpenSSI speaker TBA SSI-OSCAR Geoffroy Vallée, INRIA 16:00 -16:20 Coffee break 16:20-17:20 Session 4 (continued): Cluster Single System Image Operating Systems Session chair: Geoffroy Vallée, INRIA Millipede Virtual Parallel Machine for NT/PC clusters Assaf Schuster, Technion Genesis cluster operating system Andrzej Goscinski, Deakin University 17:20-18:00 Panel: SSI: Software versus Hardware Approaches Moderator: Stephen Scott, ORNL A Parallel File System for Networks of Windows Worstations José María Pérez Jesús Carretero José Daniel García Computer Science Department. Computer Science Department. Computer Science Department. Universidad Carlos III de Madrid Universidad Carlos III de Madrid Universidad Carlos III de Madrid Av. De la Unversidad, 30 Av. De la Unversidad, 30 Av. de la Universidad Carlos III, 22 Leganes 28911, Madrid, Spain Leganes 28911, Madrid, Spain Colmenarejo 28270, Madrid, Spain +34 91 624 91 04 +34 91 624 94 58 +34 91 856 13 16 [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION The usage of parallelism in file systems allows the In the last years, the need of high performance data achievement of high performance I/O in clusters and storage has grown as the capacity of disks and the networks of workstations. Traditionally this kind of applications needs have grown [1][2][3]. One solution was only available for UNIX systems, approach to overpass the bottleneck that characterizes requires the usage of special servers and the usage of typical I/O systems is the usage of parallel I/O special APIs, which leads to the modification, and/or approach [1]. This technique allows the creation of recompilation of existing applications. This paper large storage systems, by joining several storage presents the first prototype of a Parallel File System, resources, to increase the scalability and performance called WinPFS, for the Windows platform. It is of the I/O system and to provide load balancing. implemented as a new Windows File System and it is The usage of parallelism in file systems relies on the integrated within the Windows kernel components, fact that a distributed and parallel system consists on which implies that no modification or recompilation several nodes with storage devices. Performance and of applications is needed to take advantage of parallel bandwidth can be increased if data accesses are I/O. WinPFS uses shared folders (through the usage exploited in parallel. Parallelism in file systems is of the CIFS/SMB protocol) to access remote data in obtained by using several independent server nodes, parallel. The proposed prototype has been developed each one supporting one or more secondary storage under the Windows XP platform, and has been tested devices. Data are striped among those nodes and with a cluster of Windows XP nodes and a Windows devices to allow parallel accesses to different files, 2003 Server node. and parallel accesses to the same file. Initially, this Categories and Subject Descriptors idea was used in RAID [4] (Redundant Array of D.4.3 [ Operating Systems ]: File Systems Inexpensive Disks). However, when a RAID is used Management – distributed file systems. C.2.4 in a traditional file server, the I/O bandwidth is [Computer-Communications Networks ]: limited by the server memory bandwidth. However, if Distributed Systems – network operating systems. several servers are used in parallel, performance can be increased in two ways: Keywords 1. Allowing parallel access to different files by Parallel I/O, Cluster, Windows. using several disks and servers. 2. Striping data using distributed partitions, allowing parallel access to the data of the same file. However, current parallel file systems and parallel I/O libraries lack of generality and flexibility for general purpose distributed environments. Furthermore, all parallel file systems do not use standard servers, which makes it very difficult to integrate those describes what she wants and the system tries to systems in existing networks of workstations due to optimize the I/O requests applying optimization the need of installing new servers that are not easy to techniques. This approach is used in, ViPIOS[10]. use and that are only available for specific platforms The main problem with parallel I/O software (usually some UNIX flavour). Moreover, those architectures and parallel I/O techniques is that they systems are implemented outside the operating often lack of generality and flexibility, because they system, so that a new I/O API is needed to take create only tailor-made software for specific advantage of parallel I/O with the modification of problems. On the other hand, parallel file systems are existing applications. specially conceived for multiprocessors and Most of the software related to high performance I/O multicomputers, and do not integrate appropriately in is only available for UNIX environments, or has been general purpose distributed environments as clusters created as UNIX middleware. The work presented in of workstations. this paper tries to fulfil the lack of this kind of systems in the Windows environment, presenting a Last years some file
Recommended publications
  • Verteilte Betriebssystemedistributed Operating Systems
    Verteilte Betriebssysteme – Fallstudien Verteilter Betriebssysteme Verteilte Betriebssysteme 3.1 Motivation Wintersemester 2020/2021 3.1 Motivation I Bevor wir einzelne Aspekte diskutiert, betrachten wir einige reale Betriebssysteme Verteilte Betriebssysteme I Betriebssysteme für verteilte Systeme müssen verschiedene Probleme lösen I Bisher keine einheitliche Problemsicht á unterschiedliche Konzepte und Abstraktionen 3. Kapitel I Sehr heterogen I Noch kaum etablierte best practices oder gar Standards, erst in letzter Zeit Bemühungen im Rahmen Fallstudien Verteilter Betriebssysteme des Cloud-Computing Konzentrieren uns hier auf Abstraktionen Matthias Werner I Professur Betriebssysteme I Unterschiedliche Abstraktionen (Modelle) stellen unterschiedliche Anforderungen an Implementierung WS 2020/21 · Matthias Werner III – 2 von 36 osg.informatik.tu-chemnitz.de Verteilte Betriebssysteme – Fallstudien Verteilter Betriebssysteme Verteilte Betriebssysteme – Fallstudien Verteilter Betriebssysteme 3.2 Mach 3.2 Mach 3.2 Mach – Tore als Stellvertreter Mach-Objekte I Entwickelt 1983–86 von der Carnegie-Mellon-University (CMU) in Pittsburgh I Vorgänger: Accent (1981, CMU) I Ursprüngliches Ziel: Moderne Neuimplementierung von UNIX BSD 4.3 I BS-Kern Mach 3.0 verfügbar für die meisten PC- und Workstation-Prozessoren I Grundlage von OSF/1, dem Unix-Standard der Open System Foundation I Kernel für: I NeXTSTEP / OPENSTEP - Rhapsody I XNU (Basis von Darwin, was wiederum die Basis von MacOS X ist) I MkLinux (Power Macintosh) I GNU Hurd (in Form von GNU Mach)
    [Show full text]
  • Projeto Integrado Envolvendo a Tecnologia ATM Sobre Um “Backbone”De Alta Velocidade Na Região Metropolitana Do Rio De Janeiro
    Projeto Integrado Envolvendo a Tecnologia ATM sobre um “Backbone”de Alta Velocidade na Região Metropolitana do Rio de Janeiro Instituição Responsável: Universidade Federal do Rio de Janeiro (UFRJ) Coordenação dos Programas de Pós-Graduação em Engenharia (Coppe) Caixa Postal 68.511 - CT, Bloco G - Ilha do Fundão CEP: 21945-970 - Rio de Janeiro, RJ Líder do Projeto: Luís Felipe Magalhães de Moraes CPF: 352.674.617-68 Telefone: (021) 590-2552, r. 306 E-mail: [email protected] 1. SUMÁRIO EXECUTIVO ............................................................................................................ 6 2. DESCRIÇÃO DOS RECURSOS SOLICITADOS (EQUIPAMENTOS, PROGRAMAS E BOLSAS).......................................................................................................................................... 7 2.1 EQUIPAMENTOS CONTIDOS NO ANEXO I DO EDITAL..................................................................... 7 2.2 EQUIPAMENTOS E PROGRAMAS ADICIONAIS POR INSTITUIÇÃO ...................................................... 8 2.3 BOLSAS POR INSTITUIÇÃO.......................................................................................................... 8 2.3.1 CBPF ............................................................................................................................... 8 2.3.2 FioCruz............................................................................................................................ 9 2.3.3 IMPA ..............................................................................................................................
    [Show full text]
  • Architectural Review of Load Balancing Single System Image
    Journal of Computer Science 4 (9): 752-761, 2008 ISSN 1549-3636 © 2008 Science Publications Architectural Review of Load Balancing Single System Image Bestoun S. Ahmed, Khairulmizam Samsudin and Abdul Rahman Ramli Department of Computer and Communication Systems Engineering, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia Abstract: Problem statement: With the growing popularity of clustering application combined with apparent usability, the single system image is in the limelight and actively studied as an alternative solution for computational intensive applications as well as the platform for next evolutionary grid computing era. Approach: Existing researches in this field concentrated on various features of Single System Images like file system and memory management. However, an important design consideration for this environment is load allocation and balancing that is usually handled by an automatic process migration daemon. Literature shows that the design concepts and factors that affect the load balancing feature in an SSI system are not clear. Result: This study will review some of the most popular architecture and algorithms used in load balancing single system image. Various implementations from the past to present will be presented while focusing on the factors that affect the performance of such system. Conclusion: The study showed that although there are some successful open source systems, the wide range of implemented systems investigated that research activity should concentrate on the systems that have already been proposed and proved effectiveness to achieve a high quality load balancing system. Key words: Single system image, NOWs (network of workstations), load balancing algorithm, distributed systems, openMosix, MOSIX INTRODUCTION resources transparently irrespective of where they are available[1].The load balancing single system image Cluster of computers has become an efficient clusters dominate research work in this environment.
    [Show full text]
  • Administration UNIX ARS 2010 – 2011 Partie 1 1 / 822 Chapitre 1 UNIX : Généralités, Historique
    Administration de systèmes UNIX Formation ARS 2010 – 2011 Partie 1 Thierry Besançon Formation Permanente de l’Université Pierre et Marie Curie c T.Besançon (v13.0.486) Administration UNIX ARS 2010 – 2011 Partie 1 1 / 822 Chapitre 1 UNIX : généralités, historique c T.Besançon (v13.0.486) Administration UNIX ARS 2010 – 2011 Partie 1 2 / 822 1 UNIXChapitre : généralités, 1 historique• UNIX : généralités, historique 1.1 UNIX, un système d’exploitation §1.1 • UNIX, un système d’exploitation Un système d’exploitation est une application comme une autre en gros. Par exemple, noyau LINUX version REDHAT 2.6.18-164 : 43682 fichiers dont 18205 fichiers « .c » 17699 fichiers « .h » 1805 fichiers « .S » Les missions d’un système d’exploitation sont : mise à disposition de ressources matérielles : espace disque, temps d’exécution sur le microprocesseur central, espace mémoire, etc. partage équitable de ces ressources entre les utilisateurs pour atteindre le but de système multi-utilisateurs c T.Besançon (v13.0.486) Administration UNIX ARS 2010 – 2011 Partie 1 3 / 822 1 UNIXChapitre : généralités, 1 historique• UNIX : généralités, historique 1.2 Panorama de quelques UNIX du marché §1.2 • Panorama de quelques UNIX du marché Constructeurs de hardware Marque Site web Version d’UNIX APPLE http ://www.apple.com MacOS X 10.x CRAY http ://www.cray.com Unicos ?. ? HP http ://www.hp.com HP-UX 11.x + COMPAQ http ://www.digital.com Tru64 Unix 5.x + DIGITAL IBM http ://www.ibm.com AIX 5.x SGI http ://www.sgi.com IRIX 6.x.y SUN http ://www.sun.com Solaris 10 OpenSolaris 10 Ces UNIX ne sont pas interchangeables : Solaris pour machines de marque SUN MacOS X pour machines de marque APPLE AIX pour machines de marque IBM etc.
    [Show full text]
  • Implementacão De Um Terminal X Para O Sistema Operacional Plurix
    IMPLEMENTACÃO DE UM TERMINAL X PARA O SISTEMA OPERACIONAL PLURIX Luiz Fernado Huet de Bacellar TESE SUBMETIDA AO CORPO DOCENTE DA COORDENAÇÃO DOS PROGRAMAS DE PÓS-GRADUAÇÃO DE ENGENHARIA DA UNIVERSIDADE FEDERAL DO RIO DE JANEIRO COMO PARTE DOS REQUISITOS NECESSARIOS PARA A OBTENÇÃO DO GRAU DE MESTRE EM CIÊNCIAS EM ENGENHARIA DE SISTEMAS E COMPUTAÇÃO. Aprovada por: Prof. Newton Faller, Ph.D. (presidente) RIO DE JANEIRO, RJ - BRASIL MARCO DE 1990 BACELLAR, LUIZ FERNANDO HUET DE Implementação de um Terminal X para o Sistema Operacional Plurix [Rio de Janeiro] 1990 IX, 104 p. 29,7 cm (COPPE/UFRJ, M.Sc., Engenharia de Sistemas e Computação, 1990) Tese - Universidade Federal do Rio de Janeiro, COPPE 1. Sistemas Computacionais Gráficos I. COPPE/UFRJ 11. Título (série). A minha esposa Adriana AGRADECIMENTOS Ao Doutor Newton Faller pelo apoio, incentivo e orientação fornecidos no transcorrer deste trabalho. Aos Doutores Edil severiano Tavares Fernandes e Júlio Salek Aude pela honra de tê-los participando da banca. Aos meus Pais por terem tornado possível a chegada a este ponto. Aos Amigos do NCE que de alguma forma contribuiram para a realização deste trabalho e ao NCE por ter tornado viável o desenvolvimento e a implementação do trabalho. Resumo da Tese apresentada a COPPE/UFRJ como parte dos requisitos necessários para obtenção do grau de Mestre em Ciências (M. Sc. ) IMPLEMENTAÇÃO DE UM TERMINAL X PARA O SISTEMA OPERACIONAL PLURIX Luiz Fernando Huet de Bacellar Março, 1990 Orientador: Prof. Newton Faller, Ph.D. Programa: Engenharia de Sistemas e Computação Este trabalho consiste na definição e implementação de um sistema integrado de hardware e software, denominado Terminal X.
    [Show full text]
  • Design of the SPEEDOS Operating System Kernel
    Universität Ulm Fakultät für Informatik Abteilung Rechnerstrukturen Design of the SPEEDOS Operating System Kernel Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Fakultät für Informatik der Universität Ulm vorgelegt von Klaus Espenlaub aus Biberach a.d. Riß 2005 Official Copy, serial number df13c6b7-d6ed-e3f4-58b6-8662375a2688 Amtierender Dekan: Prof. Dr. Helmuth Partsch Gutachter: Prof. Dr. J. Leslie Keedy (Universität Ulm) Gutachter: Prof. Dr. Jörg Kaiser (Otto-von-Guericke-Universität, Magdeburg) Gutachter: Prof. John Rosenberg (Deakin University, Geelong, Victoria, Australia) Prüfungstermin: 11.07.2005 iii Abstract (Eine inhaltsgleiche, deutsche Fassung dieser Übersicht ist ab Seite 243 zu finden.) The design of current operating systems and their kernels shows deficiencies in re- spect to the structuring approach and the flexibility of their protection systems. The operating systems and applications suffer under this lack of extensibility and flexib- ility. The protection model implemented in many operating systems is not powerful enough to represent arbitrary protection conditions on a more fine-grained granu- larity than giving read and/or write access to an entire object. Additionally current operating systems are not capable of controlling the flow of information between software units effectively. Confinement conditions cannot be expressed explicitly and thus confinement problems can only be solved indirectly. Further complications with the protection system and especially the software structure in modern operating systems based on the microkernel approach are caused by the use of the out-of-process model. It is extremely difficult to spe- cify access rights appropriately, because the client/server paradigm does not easily allow a relationship to be established between the role of the client and the per- missions of the server.
    [Show full text]
  • Downloaded on 2018-08-23T19:11:32Z Single System Image: a Survey
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Cork Open Research Archive Title Single system image: A survey Author(s) Healy, Philip D.; Lynn, Theo; Barrett, Enda; Morrisson, John P. Publication date 2016-02-17 Original citation Healy, P., Lynn, T., Barrett, E. and Morrison, J. P. (2016) 'Single system image: A survey', Journal of Parallel and Distributed Computing, 90- 91(Supplement C), pp. 35-51. doi:10.1016/j.jpdc.2016.01.004 Type of publication Article (peer-reviewed) Link to publisher's http://dx.doi.org/10.1016/j.jpdc.2016.01.004 version Access to the full text of the published version may require a subscription. Rights © 2016 Elsevier Inc. This is the preprint version of an article published in its final form in Journal of Parallel and Distributed Computing, available https://doi.org/10.1016/j.jpdc.2016.01.004. This manuscript version is made available under the CC BY-NC-ND 4.0 licence https://creativecommons.org/licenses/by-nc-nd/4.0/ Item downloaded http://hdl.handle.net/10468/4932 from Downloaded on 2018-08-23T19:11:32Z Single System Image: A Survey Philip Healya,b,∗, Theo Lynna,c, Enda Barretta,d, John P. Morrisona,b aIrish Centre for Cloud Computing and Commerce, Dublin City University, Ireland bComputer Science Dept., University College Cork, Ireland cDCU Business School, Dublin City University, Ireland dSoftware Research Institute, Athlone Institute of Technology, Ireland Abstract Single system image is a computing paradigm where a number of distributed computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system.
    [Show full text]
  • O Projeto PEGASUS/PLURIX/TROPIX, Um UNIX Brasileiro
    1 O Projeto PEGASUS/PLURIX/TROPIX, um UNIX brasileiro Pedro Salenbauch (editor, autor), Newton Faller (in memoriam), et al. Instituto Tércio Pacitti – NCE Universidade Federal do Rio de Janeiro - UFRJ 1. Introdução AO PLURIX Durante os anos de 1983 a 2012, o NCE/UFRJ desenvolveu o projeto PEGASUS/PLURIX/TROPIX, e este artigo pretende descrever o histórico deste empreendimento. A partir do ano de 1976, diversos pesquisadores do NCE/UFRJ encontravam-se nos Estados Unidos e na Europa em cursos de doutoramento, época em que o Sistema Operacional UNIX começou a ser instalado no computadores das Universidades em que eles se encontravam. O UNIX é um Sistema Operacional desenvolvido nos Laboratórios Bell da AT&T, cuja filosofia, na época revolucionária, influenciou o desenvolvimento de inúmeros Sistemas Operacionais. Durante esta época, inúmeras teses foram desenvolvidas em ambientes UNIX, gerando sistemas específicos para execução em sistemas UNIX. Estes pesquisadores naturalmente tornaram-se apreciadores do Sistema. No entanto, ao retornarem ao Brasil, a partir de 1979, depararam-se com a ausência deste Sistema no Brasil. Foi então tentado, através do NCE/UFRJ o licenciamento do Sistema diretamente da AT&T. As negociações foram iniciadas, mas infelizmente, as respostas da AT&T aos pedidos do NCE/UFRJ foram sempre ambíguas e a decisão final foi sendo sempre adiada e o licenciamento jamais acabou-se concretizando. O motivo foi que o Brasil na época era considerado um país “pirata”, que não respeitava os contratos de licenciamento. Nos anos subsequentes, ainda mais pesquisadores que haviam apreciado o uso do UNIX no exterior retornaram ao Brasil. Com isto, aumentava o número de pesquisadores descontentes, pois no NCE/UFRJ não havia computadores com UNIX, e o que é pior, não havia sequer a perspectiva de tê-los a curto prazo.
    [Show full text]
  • Programming Models and Runtime Systems for Heterogeneous Architectures Sylvain Henry
    Programming Models and Runtime Systems for Heterogeneous Architectures Sylvain Henry To cite this version: Sylvain Henry. Programming Models and Runtime Systems for Heterogeneous Architectures. Other [cs.OH]. Université Sciences et Technologies - Bordeaux I, 2013. English. NNT : 2013BOR14899. tel-00948309 HAL Id: tel-00948309 https://tel.archives-ouvertes.fr/tel-00948309 Submitted on 18 Feb 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Numéro d’ordre: 4899 THÈSE présentée à L’UNIVERSITÉ BORDEAUX 1 ÉCOLE DOCTORALE DE MATHÉMATIQUES ET INFORMATIQUE DE BORDEAUX par Sylvain HENRY POUR OBTENIR LE GRADE DE DOCTEUR SPÉCIALITÉ : INFORMATIQUE *************************************** Modèles de programmation et supports exécutifs pour architectures hétérogènes Programming Models and Runtime Systems for Heterogeneous Architectures *************************************** Soutenue le jeudi 14 novembre 2013 Composition du jury Président : M. Luc Giraud, Directeur de Recherche à Inria Rapporteurs : M. Jean-François Méhaut, Professeur à l’Université de Grenoble M. François Bodin, Professeur à l’Unviersité de Rennes Examinateurs : M. Eric Petit, Ingénieur de Recherche à l’Université de Versailles Saint Quentin (directeur de thèse) M. Denis Barthou, Professeur à l’Institut Polytechnique de Bordeaux (directeur de thèse) M. Alexandre Denis, Chargé de Recherche à Inria Résumé en français Le travail réalisé lors de cette thèse s’inscrit dans le cadre du calcul haute performance sur architectures hétérogènes.
    [Show full text]
  • PLURIX, UNIX E Os Sistemas Abertos
    PLURIX, UNIX e os Sistemas Abertos PLURIX, UNIX e os Sistemas Abertos Newton Faller NCE/UFRJ Tópicos: • Sistemas Abertos • A História do UNIX • Os SOFIX (Sistemas Operacionais com Filosofia UNIX) • Os SOFIX no Brasil • O Desenvolvimento do PLURIX • O Ambiente Operacional de um SOFIX • Desenvolvendo Software num SOFIX • Padrões: SVID, POSIX, X/OPEN • Comunicações: protocolos ISO e TCP/IP • Interfaces Gráficas: X/WINDOW, MOTIF, OPEN LOOK, etc. • Perspectivas PLURIX, UNIX e os Sistemas Abertos SISTEMAS ABERTOS 1) O sistema obedece a especificações bem definidas e disponíveis para a indústria; 2) As especificações são seguidas por produtos de diversos fabricantes de computadores independentes (concorrentes); 3) As especificações não são controladas por um grupo pequeno de fabricantes; 4) As especificações não estão restritas à arquitetura ou tecnologia de um computador específico. Ref.: Pajari, G., "Just what is an Open System", UnixWorld, Nov. 89 PLURIX, UNIX e os Sistemas Abertos HISTÓRIA DO UNIX Final dos anos 60: MULTICS (MIT, BELL, GE) Núcleo Básico: PDP-7 -> PDP-11 Origem da Linguagem C: ALGOL 60 -> BCPL -> B -> C 1973: UNIX reescrito em C abre o caminho para a PORTABILIDADE 1975: Versão 6 -> Universidades 1978: Versão 7 (portátil) -> uso comercial 1982: System III (padrão da AT&T) 1983: System V (novo padrão da AT&T reconhecido pela INTEL, MOTOROLA, NATIONAL e ZILOG) 1984: /usr/group -> Standard Proposal 1985: System V.2 System V Interface Definition (SVID) 1985/1986: X/OPEN (padrão para BULL, ICL, NIXDORF, OLIVETTI, PHILIPS e SIEMENS)
    [Show full text]
  • Parallel Ray-Tracing with a Transactional DSM
    Parallel Ray-Tracing with a Transactional DSM S. Frenz, M. Schoettner, R. Goeckelmann, P. Schulthess Ulm University, Distributed Systems Department [email protected] Abstract Equally important is the automatic consistency management of replicated data in the DSM. Many Distributed Shared Memory (DSM) is a well-known consistency models have been proposed by the DSM alternative to explicit message passing and remote community [3]. Weak and weaker consistency models procedure call. Numerous DSM systems and consistency were introduced and became thus more efficient but also models have been proposed in the past. The Plurix harder to program. Originally, DSM was developed to project implements a DSM operating system (OS) storing allow execution of parallel programs written for data and code within the DSM. Our DSM storage expensive multi-processor machines on cheap commodity features a new consistency model (transactional clusters without major modifications. But using weak consistency) combining restartable transactions with an consistency models required artful modifications to the optimistic synchronization scheme instead of relying on a source code. We believe that this requirement is one of hard to use weak consistency model. In this paper we the reasons why the DSM concept is still not widely evaluate our system for the first time with a real parallel accepted, not even for new application fields. application, a parallel ray-tracer. The measurements Considering this defect the Plurix project implements show that our DSM scales quite well for this application the first OS storing data and code within the DSM. By even though we are using a strong consistency model.
    [Show full text]
  • Fault Tolerance in a DSM Cluster Operating System
    Fault Tolerance in a DSM Cluster Operating System Michael Schoettner, Stefan Frenz, Ralph Goeckelmann, Peter Schulthess Distributed Systems University of Ulm nformatik o-27 89069 Ulm (schoettner)frenz)goeckelmann)schulthess+, informatik.uni-ulm.de Abstract: Pluri. is a new distributed operating system for P0 clusters. 1odes communicate using a Distributed Shared Memory 2DSM3 that stores data and code, including drivers and kernel conte.ts. This approach simplifies global resource management and checkpointing. nstead of e.ecuting processes and threads our operating system uses restartable transactions combined with an optimistic synchronization scheme. 5eyond restartable Transactions the cluster itself is restartable allowing a simple and fast reset of the cluster in case of an non local error. n this paper we describe relevant parts of the Pluri. architecture and discuss how these properties simplify checkpointing and recovery. 6 e also present our current approach for error detection, checkpointing, and recovery. 1 Introduction E.pensive super computers are more and more replaced by cheap commodity P0s clusters. They aggregate resources like 0PU and memory thus enabling the e.ecution of parallel applications. 5ut also the inherent redundancy of hardware and data replication makes clusters interesting for high availability services. mplementing a Single System mage 2SS 3 is a elegant approach to simplify global resource management in a cluster. To avoid mechanism redundancy, conflicting decisions and decrease the software comple.ity, resource management should be organized in a unified and integrated way. 6 e believe it is natural that this is the task of a distributed operating system 28S3. 6 e have developed the Pluri.
    [Show full text]