Niagara: a 32-Way Multithreaded Sparc Processor

Total Page:16

File Type:pdf, Size:1020Kb

Niagara: a 32-Way Multithreaded Sparc Processor NIAGARA: A 32-WAY MULTITHREADED SPARC PROCESSOR THE NIAGARA PROCESSOR IMPLEMENTS A THREAD-RICH ARCHITECTURE DESIGNED TO PROVIDE A HIGH-PERFORMANCE SOLUTION FOR COMMERCIAL SERVER APPLICATIONS. THE HARDWARE SUPPORTS 32 THREADS WITH A MEMORY SUBSYSTEM CONSISTING OF AN ON-BOARD CROSSBAR, LEVEL-2 CACHE, AND MEMORY CONTROLLERS FOR A HIGHLY INTEGRATED DESIGN THAT EXPLOITS THE THREAD-LEVEL PARALLELISM INHERENT TO SERVER APPLICATIONS, WHILE TARGETING LOW LEVELS OF POWER CONSUMPTION. Over the past two decades, micro- application performance by improving processor designers have focused on improv- throughput, the total amount of work done ing the performance of a single thread in a across multiple threads of execution. This is desktop processing environment by increas- especially effective in commercial server appli- ing frequencies and exploiting instruction cations such as databases1 and Web services,2 level parallelism (ILP) using techniques such which tend to have workloads with large as multiple instruction issue, out-of-order amounts of thread level parallelism (TLP). issue, and aggressive branch prediction. The In this article, we present the Niagara Poonacha Kongetira emphasis on single-thread performance has processor’s architecture. This is an entirely shown diminishing returns because of the lim- new implementation of the Sparc V9 archi- Kathirgamar Aingaran itations in terms of latency to main memory tectural specification, which exploits large and the inherently low ILP of applications. amounts of on-chip parallelism to provide Kunle Olukotun This has led to an explosion in microproces- high throughput. Niagara supports 32 hard- sor design complexity and made power dissi- ware threads by combining ideas from chip Sun Microsystems pation a major concern. multiprocessors3 and fine-grained multi- For these reasons, Sun Microsystems’ Nia- threading.4 Other studies5 have also indicated gara processor takes a radically different the significant performance gains possible approach to microprocessor design. Instead using this approach on multithreaded work- of focusing on the performance of single or loads. The parallel execution of many threads dual threads, Sun optimized Niagara for mul- effectively hides memory latency. However, tithreaded performance in a commercial serv- having 32 threads places a heavy demand on er environment. This approach increases the memory system to support high band- 0272-1732/05/$20.00 ” 2005 IEEE Published by the IEEE computer Society 21 Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on January 28, 2010 at 22:25 from IEEE Xplore. Restrictions apply. HOT CHIPS 16 Table 1. Commercial server applications. Instruction-level Thread-level Working Data Benchmark Application category parallelism parallelism set sharing Web99 Web server Low High Large Low JBB Java application server Low High Large Medium TPC-C Transaction processing Low High Large High SAP-2T Enterprise resource planning Medium High Medium Medium SAP-3T Enterprise resource planning Low High Large High TPC-H Decision support system High High Large Medium width. To provide this bandwidth, a crossbar a server running these applications is the sus- interconnects scheme routes memory refer- tained throughput of client requests. ences to a banked on-chip level-2 cache that Furthermore, the deployment of servers all threads share. Four independent on-chip commonly takes place in high compute den- memory controllers provide in excess of 20 sity installations such as data centers, where Gbytes/s of bandwidth to memory. supplying power and dissipating server- Exploiting TLP also lets us improve per- generated heat are very significant factors in formance significantly without pushing the the center’s cost of operating. Experience at envelope on CPU clock frequency. This and Google shows a representative power density the sharing of CPU pipelines among multi- requirement of 400 to 700 W/sq. foot for ple threads enable an area- and power- racked server clusters.2 This far exceeds the typ- efficient design. Designers expect Niagara to ical power densities of 70 to 150 W/foot2 sup- dissipate about 60 W of power, making it ported by commercial data centers. It is very attractive for high compute density envi- possible to reduce power consumption by sim- ronments. In data centers, for example, power ply running the ILP processors in server clus- supply and air conditioning costs have ters at lower clock frequencies, but the become very significant. Data center racks proportional loss in performance makes this often cannot hold a complete complement of less desirable. This situation motivates the servers because this would exceed the rack’s requirement for commercial servers to improve power supply envelope. performance per watt. These requirements We designed Niagara to run the Solaris have not been efficiently met using machines operating system, and existing Solaris applica- optimized for single-thread performance. tions will run on Niagara systems without Commercial server applications tend to have modification. To application software, a Nia- low ILP because they have large working sets gara processor will appear as 32 discrete proces- and poor locality of reference on memory sors with the OS layer abstracting away the access; both contribute to high cache-miss hardware sharing. Many multithreaded appli- rates. In addition, data-dependent branches cations currently running on symmetric mul- are difficult to predict, so the processor must tiprocessor (SMP) systems should realize discard work done on the wrong path. Load- performance improvements. This is consistent load dependencies are also present, and are not with observations from previous multithread- detectable in hardware at issue time, resulting ed-processor development at Sun6,7 and from in discarded work. The combination of low Niagara chips and systems, which are under- available ILP and high cache-miss rates caus- going bring-up testing in the laboratory. es memory access time to limit performance. Recently, the movement of many retail and Therefore, the performance advantage of using business processes to the Web has triggered a complex ILP processor over a single-issue the increasing use of commercial server appli- processor is not significant, while the ILP cations (Table 1). These server applications processor incurs the costs of high power and exhibit large degrees of client request-level par- complexity, as Figure 1 shows. allelism, which servers using multiple threads However, server applications tend to have can exploit. The key performance metric for large amounts of TLP. Therefore, shared- 22 IEEE MICRO Authorized licensed use limited to: Carnegie Mellon Libraries. Downloaded on January 28, 2010 at 22:25 from IEEE Xplore. Restrictions apply. memory machines with discrete single-thread- Single ed processors and coherent interconnect have issue CMC M C M tended to perform well because they exploit ILP CMC M C M TLP. However, the use of an SMP composed of multiple processors designed to exploit ILP TLP CM is neither power efficient nor cost-efficient. A (on shared Time saved CM more efficient approach is to build a machine single issue pipeline) using simple cores aggregated on a single die, CM with a shared on-chip cache and high band- width to large off-chip memory, thereby Memory latency Compute latency aggregating an SMP server on a chip. This has the added benefit of low-latency communi- Figure 1. Behavior of processors optimized for TLP and ILP on cation between the cores for efficient data commercial server workloads. In comparison to the single- sharing in commercial server applications. issue machine, the ILP processor mainly reduces compute time, so memory access time dominates application perfor- Niagara overview mance. In the TLP case, multiple threads share a single-issue The Niagara approach to increasing pipeline, and overlapped execution of these threads results in throughput on commercial server applications higher performance for a multithreaded application. involves a dramatic increase in the number of threads supported on the processor and a memory subsystem scaled for higher band- of bandwidth. A two-entry queue is available widths. Niagara supports 32 threads of exe- for each source-destination pair, and it can cution in hardware. The architecture queue up to 96 transactions each way in the organizes four threads into a thread group; the crossbar. The crossbar also provides a port for group shares a processing pipeline, referred to communication with the I/O subsystem. as the Sparc pipe. Niagara uses eight such Arbitration for destination ports uses a sim- thread groups, resulting in 32 threads on the ple age-based priority scheme that ensures fair CPU. Each SPARC pipe contains level-1 scheduling across all requestors. The crossbar caches for instructions and data. The hard- is also the point of memory ordering for the ware hides memory and pipeline stalls on a machine. given thread by scheduling the other threads The memory interface is four channels of in the group onto the SPARC pipe with a zero dual-data rate 2 (DDR2) DRAM, supporting cycle switch penalty. Figure 1 schematically a maximum bandwidth in excess of 20 shows how reusing the shared processing Gbytes/s, and a capacity of up to 128 Gbytes. pipeline results in higher throughput. Figure 2 shows a block diagram of the Nia- The 32 threads share a 3-Mbyte level-2 gara processor. cache. This cache is 4-way banked and pipelined for bandwidth; it is 12-way set- Sparc pipeline associative to minimize conflict misses from Here we describe the Sparc pipe implemen- the many threads. Commercial server code tation, which supports four threads. Each has data sharing, which can lead to high thread has a unique set of registers and instruc- coherence miss rates. In conventional SMP tion and store buffers. The thread group shares systems using discrete processors with coher- the L1 caches, translation look-aside buffers ent system interconnects, coherence misses go (TLBs), execution units, and most pipeline out over low-frequency off-chip buses or links, registers. We implemented a single-issue and can have high latencies.
Recommended publications
  • Spreading Excellence Report
    HIGH PERFORMANCE AND EMBEDDED ARCHITECTURE AND COMPILATION Project Acronym: HiPEAC Project full title: High Performance and Embedded Architecture and Compilation Grant agreement no: ICT-217068 DELIVERABLE 3.1 SPREADING EXCELLENCE REPORT 15/04/2009 Spreading Excellence Report 1 1. Summary on Spreading Excellence ................................................................................ 3 2. Task 3.1: Conference ........................................................................................................ 4 2.1. HiPEAC 2008 Conference, Goteborg ......................................................................... 4 2.2. HiPEAC 2009 Conference, Paphos ........................................................................... 10 2.3. Conference Ranking .................................................................................................. 16 3. Task 3.2: Summer School .............................................................................................. 16 th 3.1. 4 International Summer School – 2008 ................................................................... 16 th 3.2. 5 International Summer School – 2009 ................................................................... 21 4. Task 3.3: HiPEAC Journal ............................................................................................ 22 5. Task 3.4: HiPEAC Roadmap ........................................................................................ 23 6. Task 3.5: HiPEAC Newsletter ......................................................................................
    [Show full text]
  • A Compiler-Compiler for DSL Embedding
    A Compiler-Compiler for DSL Embedding Amir Shaikhha Vojin Jovanovic Christoph Koch EPFL, Switzerland Oracle Labs EPFL, Switzerland {amir.shaikhha}@epfl.ch {vojin.jovanovic}@oracle.com {christoph.koch}@epfl.ch Abstract (EDSLs) [14] in the Scala programming language. DSL devel- In this paper, we present a framework to generate compil- opers define a DSL as a normal library in Scala. This plain ers for embedded domain-specific languages (EDSLs). This Scala implementation can be used for debugging purposes framework provides facilities to automatically generate the without worrying about the performance aspects (handled boilerplate code required for building DSL compilers on top separately by the DSL compiler). of extensible optimizing compilers. We evaluate the practi- Alchemy provides a customizable set of annotations for cality of our framework by demonstrating several use-cases encoding the domain knowledge in the optimizing compila- successfully built with it. tion frameworks. A DSL developer annotates the DSL library, from which Alchemy generates a DSL compiler that is built CCS Concepts • Software and its engineering → Soft- on top of an extensible optimizing compiler. As opposed to ware performance; Compilers; the existing compiler-compilers and language workbenches, Alchemy does not need a new meta-language for defining Keywords Domain-Specific Languages, Compiler-Compiler, a DSL; instead, Alchemy uses the reflection capabilities of Language Embedding Scala to treat the plain Scala code of the DSL library as the language specification. 1 Introduction A compiler expert can customize the behavior of the pre- Everything that happens once can never happen defined set of annotations based on the features provided by again.
    [Show full text]
  • (URMD) Grad Cohort Workshop
    MARCH 16-17, 2018 SAN DIEGO GRAD Cohort UnderrepresentedURMD Minorities & Persons with Disabilities 2018 www.cra.org Dear Grad Cohort Participant, We welcome you to the 2018 CRA Grad Cohort Workshop for Underrepresented Minorities + Persons with Disabilities 2018 (URMD)! The next few days are filled with sessions where 25 senior computing researchers and professionals will be sharing their strategies and experiences to help increase GradCohort your graduate school and career success. There will also be plenty of opportunities to meet and network with these successful researchers as well as with graduate students from other universities. We hope that you will take the utmost advantage of this unique experience by actively participating in discussions, developing peer networks, and building mentoring relationships. Since this is the inaugural URMD Grad Cohort Workshop, we are especially interested in hearing your feedback about the program and the experience. Please take time to complete the evaluation form provided after the workshop. We want to learn what you liked and did not like, as well as any suggestions you might have for improving the event in subsequent years. The 2018 CRA-URMD Workshop is made possible through generous contributions by the Computing Research Association, National Science Foundation, AccessComputing, Whova, Google and Association for Computing Machinery. Please join us in thanking them for their kind support. We hope that you take home many new insights and connections from this workshop to help you succeed in
    [Show full text]
  • Entrepreneurship Opportunities & Skills
    Entrepreneurship Opportunities & Skills Kunle Olukotun Stanford University 2019 CRA URMD Grad Cohort Workshop Kunle Olukotun • Professor of EE and CS at Stanford since 1992 – Cadence Design endowed chair • Computer architecture and parallel computing • Pioneer in CMP (multicore) processor design – Stanford Hydra project • Parallel programming for all programmers – Director of Pervasive Parallelism Lab (PPL) – High-performance Domain Specific Languages (DSLs) • Data Analytics for What’s Next (DAWN) – Democratizing machine learning Academic Research and Startups • When you can’t find an industrial partner for your ideas make one • Technology transfer by PhD – Understand the initial market for your product (technology) – What is your unfair advantage – More robust technology than research prototypes – Cost of schedule slip is much worse – You don’t have much time for research at a start-up • Pick your team wisely – Technical team (lab partners?) – Management team (who is making decisions) – Cap table • Funding issues – Investors (smart vs. dumb) – Angels vs Venture Capital firms Startup Pros/Cons Pros Cons • Don’t have to convince existing company to change • Have to convince investors course (until exit) (repeatedly) • Have to build a whole company, not just a development team • Finance, sales, marketing, … • Limited resources • Impatient capital Afara WebSystems • Founded in 1999 – Height of internet boom – Large web sites running out of power and space – Goal: Revolutionize internet data centers (multi-B $ market) – Goal: Approach: 10x
    [Show full text]
  • Implementing and Evaluating Nested Parallel Transactions in Software Transactional Memory
    Implementing and Evaluating Nested Parallel Transactions in Software Transactional Memory Woongki Baek, Nathan Bronson, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford University Stanford, CA 94305 {wkbaek,nbronson,kozyraki,kunle}@stanford.edu ABSTRACT 1. INTRODUCTION Transactional Memory (TM) is a promising technique that sim- Transactional Memory (TM) [13] has surfaced as a promising plifies parallel programming for shared-memory applications. To technique to simplify parallel programming. TM addresses the dif- date, most TM systems have been designed to efficiently support ficulty of lock-based synchronization by allowing programmers to single-level parallelism. To achieve widespread use and maximize simply declare certain code segments as transactions that execute performance gains, TM must support nested parallelism available in an atomic and isolated way with respect to other code. TM takes in many applications and supported by several programming mod- responsibility for all concurrency control. The potential of TM els. has motivated extensive research on hardware, software, and hy- We present NesTM, a software TM (STM) system that supports brid implementations. We focus on software TM (STM) [8,11,19], closed-nested parallel transactions. NesTM is based on a high- because it is the only approach compatible with existing and up- performance, blocking STM that uses eager version management coming multicore chips. and word-granularity conflict detection. Its algorithm targets the Most TM systems, thus far, have assumed that the code within state and runtime overheads of nested parallel transactions. We a transaction executes sequentially. However, real world applica- also describe several subtle correctness issues in supporting nested tions often include the potential for nested parallelism in various parallel transactions in NesTM and discuss their performance im- forms such as nested parallel loops, recursive function calls, and pact.
    [Show full text]
  • Ecpe Connections
    alumni newsletter fall 2 011 volume 28 issue 1 ecpe connections alum helps bring trees to 9/11 memorial plaza more inside ... • computer engineering student attends google camp • faculty makes breakthroughs in bandwidth recycling • alum becomes new ecpe external advisory board leader www.ece.iastate.edu learn invent impact department news letter from the chair in this issue am pleased to be able to share exciting news about the growth department news ........................ 3-6 and continued success of the Department of Electrical and Com- new faculty and staff • distinguished lecture puter Engineering (ECpE) and report on the accomplishments of I series • new bio-electrical engineering our students, faculty and staff, and alumni, as I begin my second course • faculty and staff recognitions • year as the Palmer Department Chair in Electrical and Computer new online graduate certificate Engineering. With 1,259 undergraduate students enrolled in our electrical, alumni news .............................. 7-10 computer, and software engineering programs, our department is murray harpole’s legacy remembered • in contributing to Iowa State University’s enrollment memoriam • new leader of external advisory Fall Student Enrollment being at an all-time high this fall. Our graduate board • advisory board members • alumni programs also are maintaining strong enrollment, with awards • alum helps bring trees to 9/11 335 students in master’s and PhD programs. Graduate memorial plaza We continue to develop our curriculum to meet Students: 335 the needs of our growing population, and have student news ......................... 12-14 introduced a new bio-electrical engineering course for Undergraduate information assurance student group • wind undergraduates and a new online graduate certificate Students: 1,259 energy research summer program • student in nondestructive evaluation (page 6).
    [Show full text]
  • Kunle Olukotun Cadence Design Systems Professor and Professor of Electrical Engineering
    Kunle Olukotun Cadence Design Systems Professor and Professor of Electrical Engineering CONTACT INFORMATION • Administrative Contact Kathy Robinson - Administrative Associate Email [email protected] Tel (650) 723-1430 Bio BIO Kunle Olukotun is the Cadence Design Systems Professor in the School of Engineering and Professor of Electrical Engineering and Computer Science at Stanford University. Olukotun is well known as a pioneer in multicore processor design and the leader of the Stanford Hydra chip multiprocessor (CMP) research project. Olukotun founded Afara Websystems to develop high-throughput, low-power multicore processors for server systems. The Afara multicore processor, called Niagara, was acquired by Sun Microsystems. Niagara derived processors now power all Oracle SPARC-based servers. Olukotun currently directs the Stanford Pervasive Parallelism Lab (PPL), which seeks to proliferate the use of heterogeneous parallelism in all application areas using Domain Specific Languages (DSLs). ACADEMIC APPOINTMENTS • Professor, Electrical Engineering • Professor, Computer Science • Faculty Affiliate, Institute for Human-Centered Artificial Intelligence (HAI) • Member, Wu Tsai Neurosciences Institute HONORS AND AWARDS • Fellow, ACM (2007) • Fellow, IEEE (2007) PROFESSIONAL EDUCATION • PhD, Michigan (1991) LINKS • Personal Site: http://arsenalfc.stanford.edu/kunle/ Teaching COURSES 2021-22 Page 1 of 2 Kunle Olukotun http://cap.stanford.edu/profiles/Oyekunle_Olukotun/ • Digital Systems Design Lab: EE 109 (Spr) • Parallel Computing: CS 149
    [Show full text]
  • Energy-Efficient Abundant-Data Computing: the N3XT 1,000X
    COVER FEATURE REBOOTING COMPUTING Energy-Efficient Abundant-Data Computing: The N3XT 1,000× Mohamed M. Sabry Aly, Mingyu Gao, Gage Hills, Chi-Shuen Lee, Greg Pitner, Max M. Shulaker, Tony F. Wu, and Mehdi Asheghi, Stanford University Jeff Bokor, University of California, Berkeley Franz Franchetti, Carnegie Mellon University Kenneth E. Goodson and Christos Kozyrakis, Stanford University Igor Markov, University of Michigan, Ann Arbor Kunle Olukotun, Stanford University Larry Pileggi, Carnegie Mellon University Eric Pop, Stanford University Jan Rabaey, University of California, Berkeley Christopher Ré, H.-S. Philip Wong, and Subhasish Mitra, Stanford University Next-generation information technologies will process unprecedented amounts of loosely structured data that overwhelm existing computing systems. N3XT improves the energy efficiency of abundant-data applications 1,000-fold by using new logic and memory technologies, 3D integration with fine-grained connectivity, and new architectures for computation immersed in memory. 24 COMPUTER PUBLISHED BY THE IEEE COMPUTER SOCIETY 0018-9162/15/$31.00 © 2015 IEEE he rising demand for high- new system technology that promises enabled by low-temperature performance IT services with to breathe new life into computing. layer transfer techniques. This human-like interfaces is driv- Key N3XT components include the unique approach decouples ing the quest for the next gen- following: high-temperature nanoma- Teration of energy-efficient computers. terial synthesis (to achieve These computers will operate on abun- › High-performance and energy- high- quality materials) from dant data that can be highly unstruc- efficient field-effect transistors low-temperature monolithic 3D tured and often streamed in terabytes. (FETs) based on atomic-scale integration.
    [Show full text]
  • News from SCS Networks
    SCS Newsletter | March 2016 News from SCS Networks IN THIS ISSUE AUSTECH 2015, THE FIRST AUST INTERNATIONAL CONFERENCE News from IN TECHNOLOGY 01 SCS Networks Mamadou Kaba Traoré Upcoming Conferences Department of Mathematics & Computer Science 03 African University of Science & Technology (AUST) The African University of Science and Technology (AUST) International Conference in Technology 2015 (AUSTECH’15) was conducted October 12-14, 2015, on AUST campus at Abuja, Nigeria. AUSTECH is an annual event by AUST which focuses on current developments in Engineering technologies, scientific and industrial applications for development in Sub-Saharan Africa. It has a multi-Conference format, with 3 symposia reflecting the core technological areas developed at AUST: • Petroleum Engineering Symposium (PES), chaired this year by D. OGBE, AUST (Nigeria). • Materials Science and Engineering Symposium (MSES), chaired this year by P. ONWUALU, AUST (Nigeria). • Computer Science Symposium (CSS), chaired this year by M.K. TRAORE, AUST and Blaise Pascal University (France), who also chaired the overall multi-conference. AUSTECH’15 program included a world-class selection of invited speakers and panels. It also promoted a Ph.D. Colloquium and Poster sessions, fora where opportunity was given to students to showcase their work in progress and link with their peers. Many people deserve credits for the success of AUSTECH’15. This includes the organization teams in the symposia and the overall multi- 2 conference, the authors, the reviewers, the panelists, and attendees who contributed to memorable cross-fertilization. The conference has Theme 2: MOBILE & TELECOM been sponsored by the ACBF (African Capacity Building Foundation). • Advances in Video Production, Application Generation and A special emphasis has to be done here on the Computer Science Telecommunications, contribution of Michael Adeyeye, University of Symposium, which focused on Theories, Technologies and Applications West Australia.
    [Show full text]
  • Multicore Cpus: Processor Proliferation - IEEE Spectrum 2/15/11 1:51 PM
    Multicore CPUs: Processor Proliferation - IEEE Spectrum 2/15/11 1:51 PM SEMICONDUCTORS / PROCESSORS FEATURE Multicore CPUs: Processor Proliferation From multicore to many-core to hard-to-describe-in-a-single-word core By SAMUEL K. MOORE / JANUARY 2011 This is part of IEEE Spectrum's special report: Top 11 Technologies of the Decade Back in 1994, programmers figured that whatever code they wrote would run at least 50 percent faster on a 1995 machine and 50 percent faster still on a '96 system. Coding would continue as it always had, with instructions designed to be executed one after the other. But Kunle Olukotun, then a newly minted professor of electrical engineering at Stanford, saw that the party couldn't go on forever. The microprocessors of the day couldn't scale up as efficiently as you'd expect through the mere addition of ever more and ever faster transistors, the two things that Moore's Law provided. To solve that problem, Olukotun and his students designed the first general-purpose multicore CPU. This idea, more than any other in the past decade, is what has kept the semiconductor industry climbing the Moore's Law performance curve. Without multicore chips, the computing capability of everything from Illustration: Frank Chimero servers to netbooks would not be much better than it was a decade ago. Everyone's happy—except perhaps for the programmers, who must now write code with threads of instructions that must be executed together—in pairs, quartets, or even larger groupings. It's not that old, single-core CPUs weren't already doing some parallel processing.
    [Show full text]
  • University of Copenhagen
    Graph Processing on GPUs A Survey Shi, Xuanhua; Zheng, Zhigao; Zhou, Yongluan; Jin, Hai; He, Ligang; Liu, Bo; Hua, Qiang- Sheng Published in: A C M Computing Surveys DOI: 10.1145/3128571 Publication date: 2018 Document version Peer reviewed version Citation for published version (APA): Shi, X., Zheng, Z., Zhou, Y., Jin, H., He, L., Liu, B., & Hua, Q-S. (2018). Graph Processing on GPUs: A Survey. A C M Computing Surveys, 50(6), [81]. https://doi.org/10.1145/3128571 Download date: 29. Sep. 2021 0 Graph Processing on GPUs: A Survey Xuanhua Shi, Services Computing Technology and System Lab/Big Data Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China Zhigao Zheng, Services Computing Technology and System Lab/Big Data Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China Yongluan Zhou, Department of Computer Science, University of Copenhagen, Denmark Hai Jin, Services Computing Technology and System Lab/Big Data Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China Ligang He, Department of Computer Science,University of Warwick,United Kingdom Bo Liu, Services Computing Technology and System Lab/Big Data Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China Qiang-Sheng Hua, Services Computing Technology and System Lab/Big Data Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, China In the big data era, much real-world data can be naturally represented as graphs.
    [Show full text]
  • 10003.Demeyerromain.2604.Pdf (0.1
    Technical Communications of the International Conference on Logic Programming, 2010 (Edinburgh), pp. 248–254 http://www.floc-conference.org/ICLP-home.html PROGRAM ANALYSIS TO SUPPORT CONCURRENT PROGRAMMING IN DECLARATIVE LANGUAGES ROMAIN DEMEYER University of Namur - Faculty of Computer Science Rue Grandgagnage 21, 5000 Namur (Belgium) E-mail address: [email protected] Abstract. In recent years, manufacturers of processors are focusing on parallel archi tectures in order to increase performance. This shift in hardware evolution is provoking a fundamental turn towards concurrency in software development. Unfortunately, de veloping concurrent programs which are correct and efficient is hard, as the underlying programming model is much more complex than it is for simple sequential programs. The goal of this research is to study and to develop program analysis to support and improve concurrent software development in declarative languages. The characteristics of these lan guages offer opportunities, as they are good candidates for building concurrent applications while their simple and uniform data representation, together with a small and formally defined semantics makes them welladapted to automatic program analysis techniques. In our work, we focus primarily on developing static analysis techniques for detecting race conditions at the application level in Mercury and Prolog programs. A further step is to derive (semi) automatically the location and the granularity of the critical sections using a datacentric approach. 1. Introduction and Problem Description Since the mid-70s, the power of the microprocessor, which is the basic component of the computer responsible for instruction execution and data processing, has increased constantly. For decades, we have witnessed a dramatic and continuous growth of clock speed, which is one of the main factors determining the performance of processors [Olu05].
    [Show full text]