High Volume Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers

Total Page:16

File Type:pdf, Size:1020Kb

High Volume Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers High Volume Computing: Identifying and Characterizing Throughput Oriented Workloads in Data Centers Jianfeng Zhan, Lixin Zhang, Ninghui Sun, Lei Wang, Zhen Jia, and Chunjie Luo State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing, China Email: fzhanjianfeng, zhanglixin, [email protected], fwl, jiazhen, [email protected] Abstract—For the first time, this paper systemati- which is also referred to as big data applications. cally identifies three categories of throughput oriented Third, lots of users tend to use streaming media or workloads in data centers: services, data processing ap- VoIP for fun or communications. Different from plications, and interactive real-time applications, whose an ordinary Web server, a VoIP application will targets are to increase the volume of throughput in terms of processed requests or data, or supported maximum maintain a user session of a long period (e.g., more number of simultaneous subscribers, respectively, and than five minutes) while guaranteeing the real time we coin a new term high volume computing (in short quality of service, which we call an interactive HVC) to describe those workloads and data center real-time application. computer systems designed for them. We characterize The workloads mentioned above consist of a and compare HVC with other computing paradigms, e.g., large amount of loosely coupled workloads instead high throughput computing, warehouse-scale computing, and cloud computing, in terms of levels, workloads, of a big tightly coupled job. The nature of this metrics, coupling degree, data scales, and number of jobs class of workloads is throughput-oriented, and the or service instances. We also preliminarily report our target of data center computer systems designed ongoing work on the metrics and benchmarks for HVC for them is to increase the volume of throughput in systems, which is the foundation of designing innovative terms of processed requests (for services), or pro- data center computer systems for HVC workloads. cessed data (for data processing applications), or Keywords-High volume computing; Throughput- the maximum number of simultaneous subscribers oriented workloads; Data center computer systems; Met- (for interactive real-time applications), performed rics; Benchmarks; or supported in data centers. So as to pay attention to this class of workloads and computer systems I. INTRODUCTION designed for them, in this paper, we coin a new In the past decade, there are three trends in com- term high volume computing (nature: throughput puting domains. First, more and more services, computing; target: high volume, in short HVC) involving a large amount of data, are deployed to describe this class of workloads or data center in data centers to serve the masses, e.g., Google computer systems designed for them. search engine and Google Map. Second, massive In this paper, we identify three categories of data are produced, stored, and analyzed in real workloads in HVC: services, data processing ap- time or off line. According to the annual survey of plications, and interactive real-time applications, the global digital output by IDC, the total amount all of which are throughput-oriented workloads. of global data passes 1.2 zettabytes in 2010. In this A service is a group of applications that collabo- paper, we call applications that produce, store, and rate to receive user requests and return responses analyze massive data data processing applications, to end users. More and more emerging services are data-intensive, e.g., Google search engine or in cloud workloads [1] [38] are excluded, since Google Map. Data processing applications pro- they are tightly-coupled. duce, store, and analyze massive data, and we only After widely investigating previous bench- focus on loosely coupled data processing appli- marks, we found there is no systematic work cations, excluding tightly coupled data-intensive on benchmarking data center computer systems computing, e.g., those written in MPI. Typical in the context of our identified three categories examples are MapReduce or Dryad based com- of throughput-oriented workloads. We present our puting. We also include data stream applications preliminary work on the metrics and benchmarks that process continuous unbounded streams of data for HVC systems. in real time into the second category of HVC The remainder of the paper is organized as fol- workloads. Different from an ordinary Web server, lows. Section II characterize and compare different an interactive real-time application will maintain computing paradigms. Section III revisits previous a user session of a long period while guaranteeing benchmarks and report our preliminary work on the real time quality of service. Typical interactive the HVC metrics and benchmarks. Section IV real-time applications include streaming media, draws a conclusion. desktop clouds [33], and Voice over IP (VoIP) applications. The details of three categories of II. CHARACTERIZING COMPUTING PARADIGMS workloads can be found at II-A. In this section, we give out the definition of Despite several computing paradigms are not HVC, and identify its distinguished differences formally or clearly defined, e.g., warehouse-scale from other computing paradigms. computing (WSC) [4], data-intensive scalable computing (DISC) [28], and cloud computing, we A. What is HVC? compare HVC with several computing paradigms HVC is a data center based computing paradigm in terms of six dimensions: levels, workloads, met- focusing on throughput-oriented workloads. The rics, coupling degree, data scales, and number of target of a data center computer system designed jobs or service instances as shown in Table I. Our for HVC workloads is to increase the volume definition of HVC is towards data center computer of throughput in terms of requests, or processed systems, while many task computing [14] and high data, or the maximum number of simultaneous throughput computing [26] are defined towards subscribers, which are performed or supported in runtime systems. In terms of workloads and re- a data center. spective metrics, both high throughput computing In Table I, we characterize HVC from six and high performance computing are about sci- dimensions: levels, workloads, metrics, coupling entific computing centering around floating point degree, data scales, and number of jobs or service operations, while most of HVC applications have instances. few floating point operations as uncovered in our The HVC system is defined on a data center preliminary work [3]. Meanwhile, we also notice level. We identify three categories of workloads that many emerging workloads can be included in HVC: services, data processing applications, into one or two categories of HVC workloads, e.g., and interactive real-time applications. Services WSC (into the first and second categories), and belong to the first category of HVC workloads. A DISC (into the second category). As for (public) service is a group of applications that collaborate cloud computing, we believe it is basically a to receive user requests and return responses to business model of renting computing or storage end users. We call a group of applications that resources, which heavily relies upon virtualization independently process requests a service instance. technologies, however HVC is defined in terms of For a large Internet service, a large amount of workloads. We think many well-know workloads service instances are deployed with requests dis- in cloud [5] can be included into HVC, but HPC tribution enabled by load balancers. Since each Table I: Characterizing different computing paradigms. Computing paradigm level Workloads Metrics Coupling Data ] jobs or ser- degree scale vice instances High performance Super computers Scientific comput- Float point opera- Tight n/a Low computing ing: heroic MPI ap- tions per second plications High performance Processors Traditional server Overall work per- loose n/a Low throughput computing workloads formed over a fixed [27] time period High throughput com- Distributed runtime Scientific comput- Float point opera- loose n/a Medium puting [26] systems ing tions per month Many task computing Runtime systems Scientific comput- Tasks per second Tight n/a Large [14] ing or data analy- or sis: workflow jobs loose Data-intensive scalable Runtime systems Data analysis: n/a Loose Large Large computing [28] or data MapReduce-like center computing [29] jobs Warehouse-scale com- Data centers for Inter- Very large Internet n/a Loose large Large puting [4] net services, belonging services to a single organization Cloud computing [34] Hosted data centers SaaS + utility com- n/a Loose n/a Large [15] puting Services Requests per min- Loose Medium Large utes and joule High volume Data centers Data processing ap- Data processed per Loose Large Large computing (HVC) plications minute and joule Interactive real- Maximum number Loose From Large time applications of simultaneous medium subscribers and to large subscribers per watt request is independent, a service in itself is loosely data stream applications into this category of HVC coupled. For an ordinary Apache Web server, the workloads. For example, S4 is a platform that data scale is lower, while for a search engine allows programmers to easily develop applications provided by Google, the data scale is large. More for processing continuous unbounded streams of and more emerging services are data-intensive. data [37]. The second category of HVC workloads is data The third
Recommended publications
  • Windows 7 Bitlocker™ Drive Encryption Security Policy for FIPS 140-2 Validation
    Windows 7 BitLocker™ Security Policy Page 1 of 16 Windows 7 BitLocker™ Drive Encryption Security Policy For FIPS 140-2 Validation For Windows 7 Document version 1.0 08/31/2011 1. Table of Contents 1. TABLE OF CONTENTS ......................................................................................................................... 1 2. INTRODUCTION .................................................................................................................................. 2 2.1 List of Cryptographic Modules ........................................................................................................................... 2 2.2 Brief Module Description ................................................................................................................................... 3 2.3 Validated Platforms ........................................................................................................................................... 4 3. INTEGRITY CHAIN OF TRUST .......................................................................................................... 4 4. CRYPTOGRAPHIC BOUNDARIES ..................................................................................................... 5 4.1 Overall Cryptographic Boundary........................................................................................................................ 5 4.2 BitLocker™ Components Included in the Boundary .......................................................................................... 5 4.3 Other Windows
    [Show full text]
  • Partition - Partitioning a Hard Drive
    Partition - Partitioning a hard drive What is a partition? The partitioning of a hard drive occurs after the drive has been physically formatted but before it is logically formatted. It involves creating areas on the disk where data will not be mixed. It can be used, for example, to install differentoperating systems that do not use the same file system. There will therefore be at least as many partitions as there are operating systems using different file systems. If you are using just one operating system, a single partition the full size of the disk is sufficient, unless you want create several partitions so as to have, for example, several drives on which data are kept separate. There are three types of partitions: primary partitions, extended partitions andlogical drives. A disk may contain up to four primary partitions (only one of which can be active), or three primary partitions and one extended partition. In the extended partition, the user can create logical drives (i.e. "simulate" several smaller-sized hard drives). Let's look at an example where the disk contains one primary partition and one extended partition made up of three logical drives (later we will look at multiple primary partitions): For DOS systems (DOS, Windows 9x), only the primary partition is bootable, and is therefore the only one on which the operating system can be started. Partitioning is the process of writing the sectors that will make up the partition table (which contains information on the partition: size in sectors, position with respect to the primary partition, types of partitions present, operating systems installed,...).
    [Show full text]
  • Windows 10 Volume Licensing Overview
    Edition & Licensing Details Windows 10 Desktop Editions Edition Benefits Delivery of Updates Deployment Options Path to buy Home • Familiar and personal experience • Windows Update • Current Branch • OEM • All-new browser great for doing things online • Retail/ESD Consumers & BYOD • New ways to get organized and be productive • Free upgrade1 • Up-to-date with latest security and features • Management for BYOD scenarios Pro • Management of devices and apps • Windows Update • Current Branch • OEM • Support for remote and mobile scenarios • Windows Update • Current Branch for Business • Retail/ESD Small, for Business lower mid-size • Cloud technologies for organizations • VL businesses • Update quality confidence with broad • WSUS • Free upgrade1 market validation Enterprise2 • Advanced security • Windows Update • Current Branch • VL • Full flexibility of OS deployment • Windows Update • Current Branch for Business Mid-size and large for Business enterprises • Advanced device and app management • Long Term Servicing Branch • Microsoft Desktop Optimization Pack (MDOP) • WSUS 1. For qualified Windows 7/8.1 devices 2. Some of these benefits require Software Assurance Windows 10 Pro in Volume Licensing Windows 10 Pro in Volume Licensing is sold only as an upgrade Standalone upgrade licenses are available through Open License and Select Plus/MPSA. Requires a qualified underlying operating system license Current Branch/Current Branch for Business Qualifying Operating Systems The following operating systems qualify for the Windows 10 Pro Upgrade
    [Show full text]
  • System Administration Storage Systems Agenda
    System Administration Storage Systems Agenda Storage Devices Partitioning LVM File Systems STORAGE DEVICES Single Disk RAID? RAID Redundant Array of Independent Disks Software vs. Hardware RAID 0, 1, 3, 5, 6 Software RAID Parity done by CPU FakeRAID Linux md LVM ZFS, btrfs ◦ Later Hardware RAID RAID controller card Dedicated hardware box Direct Attached Storage SAS interface Storage Area Network Fiber Channel iSCSI ATA-over-Ethernet Fiber Channel Network Attached Storage NFS CIFS (think Windows File Sharing) SAN vs. NAS PARTITIONING 1 File System / Disk? 2 TB maybe… 2TB x 12? 2TB x 128 then? Partitioning in Linux fdisk ◦ No support for GPT Parted ◦ GParted Fdisk Add Partition Delete Partition Save & Exit Parted Add Partition Change Units Delete Partition No need to save Any action you do is permanent Parted will try to update system partition table Script support parted can also take commands from command line: ◦ parted /dev/sda mkpart pri ext2 1Mib 10Gib Resize (Expand) 1. Edit partition table ◦ Delete and create with same start position 2. Reload partition table ◦ Reboot if needed 3. Expand filesystem Resize (Shrink) 1. Shrink filesystem ◦ Slightly smaller than final 2. Edit partition table ◦ Delete and create with same start position 3. Reload partition table ◦ Reboot if needed 4. Expand filesystem to fit partition No Partition Moving LOGICAL VOLUME MANAGER What is LVM? A system to manage storage devices Volume == Disk Why use LVM? Storage pooling Online resizing Resize any way Snapshots Concepts Physical Volume ◦ A disk or partition Volume Group ◦ A group of PVs Logical Volume ◦ A virtual disk/partition Physical Extent ◦ Data blocks of a PV Using a partition for LVM Best to have a partition table 1.
    [Show full text]
  • Change System Partition Drive Letter
    Change System Partition Drive Letter superfluouslyExogamous Isaiah while sublettingJohn remains atop. aerological Diagnostic and Skippy vatic. pockets chock. Raiding Conway pressured very HowTo Remove the heat Reserved water from Windows. You may unsubscribe at constant time. This is already been copied right place it to lvm and other partitions of the main interface and backup of. If to install device drivers, from drives whose letter needs to be changed, in CONFIG. How should change in drive like from C to common other letter. To beg an existing drive page on a stocking on a mate or convene a. All comments containing links and certain keywords will be moderated before publication. Now persist for available letter and were missing, footer and needs to make sure where applications or in. Does a systems? Against the operating system subsequently changing the quote letter. Shows garbage instead. I don't suggest changing the SRP's permissions on a production computer. Saved hours tearing out. They system partitions with these letters changed, change drive partitioning changes, and a systems may occur when windows on a single partition and you? In the Disk Management window frame can typically right-click to partition the interest cost Change Drive Letters and Paths click bounce and. Stellar Data Recovery Toolkit is an advanced software, trusted by tech. We can fill it might help. Also learn with. They contain partitions like Primary partitions and Logical drives which are typically. Windows for my external USB device like a tame drive or USB stick. Disk Management Console can also provides ability to express drive letters or paths.
    [Show full text]
  • Bitlocker Management
    BitLocker Management Vista Full Volume Encryption Feature Overview BitLocker - Full Volume Encryption Vista Enterprise and Ultimate AD management & Key backup options Save recovery password to USB, Printer, or File Share TPM 1.2 – Enables Drive Tampering Protection WMI Interface Vista Tool compatibility MS Boot Loader, System Restore, Disk Management BitLocker Tools Manage-bde.wsf - RTM Add / Remove key protectors Tools from MS Premier Support Site: BitLocker Drive Preparation BitLocker Recovery Password Viewer BitLocker Repair Tool Searches HD for unlinked FVEKs BitLocker Group Policy Computer Config\Admin Templates\Windows Components\BitLocker Drive Encryption AD Key Backup Options Backup Recovery Password to AD Backup Key Package to AD Require Backup to AD before encryption is enabled Backup Recovery Password to Share Recovery Options Require Creation of Recovery Password - Default Require Creation of Recovery Key Package - Default BitLocker Group Policy Advanced Options Allow BitLocker without TPM Startup Key or Pin with TPM Encryption Method AES 128 Diffuser – Default Prevent Memory Overwrite on Restart – Disabled TPM Platform Validation –7 Default Metrics Rom Code MBR Code – not partition table Boot Manager TPM Group Policy Computer Config\Admin Templates\System\ Trusted Platform Module Backup TPM Owner information to AD Require backup 3 settings related to TPM blocked commands BDE Security and Compatibility BitLocker Tips AD Backup only occurs when BDE is enabled Or when TPM is initialized
    [Show full text]
  • Disks, Partitions, Volumes and RAID Performance with the Linux Operating System
    Disks, Partitions, Volumes and RAID Performance with the Linux Operating System Michel R. Dagenais ([email protected]) Dept. of Computer Engineering, Ecole Polytechnique de Montréal, P.O. Box 6079, Station Centre-Ville, Montreal, (Quebec), Canada, H3C 3A7 Abstract Block devices in computer operating systems typically correspond to disks or disk partitions, and are used to store files in a filesystem. Disks are not the only real or virtual device which adhere to the block accessible stream of bytes block device model. Files, remote devices, or even RAM may be used as a virtual disks. This article examines several common combinations of block device layers used as virtual disks in the Linux operating system: disk partitions, loopback files, software RAID, Logical Volume Manager, and Network Block Devices. It measures their relative performance using different filesystems: Ext2, Ext3, ReiserFS, JFS, XFS,NFS. Introduction As Central Processing Units (CPU) rush into the multi Gigahertz frequency range, with several instructions issued per clock cycle, disks have long been predicted to be the weak link of the performance chain [1]. Fortunately, computer architects have concentrated on this weak link and file access performance has kept up with the CPU performance better than suggested by the disk seek time alone. Not only are disks a crucial aspect of performance, this is where all the persistent information is stored. Their organisation strongly impacts the maintenance required (e.g. moving files when a disk is full, recovering data upon disk failure) and the survivability to crashes (e.g. accidental reboot). A number of block device layers may be used to obtain a more flexible organisation than one device per physical disk [2] [3] (e.g.
    [Show full text]
  • Tectips: Hidden FDISK(32) Options
    WHITE PAPER TecTips: Hidden FDISK(32) Options Guide Previously Undocumented Options of the FDISK Utility Released Under Microsoft Windows95™ OSR2 or Later Abstract 2 Document Conventions 2 Read This First 3 Best-Case Scenario 3 Windows Startup Disk 3 How to apply these options 3 FDISK(32) Options 4 Informational Options 4 Behavioral Options 5 Functional Options 6 February, 2000 Content ©1999 StorageSoft Corporation, all rights reserved Authored by Doug Hassell, In-house Technical Writer StorageSoft White Paper page 2 FDISK(32) Command Line Options Abstract Anyone that remembers setting-up Windows 3.x or the first Win95 release surely knows of the text-based utility, fdisk.exe. Some of those may even be aware of the few, documented switches, such as /status, /x or even the commonly referenced /mbr. Even fewer would be aware of the large table of undocumented command-line options - including automated creation, reboot behavior, and other modifiers - which we will divulge in this document. Note that all options given here are not fully tested, nor are they guaranteed to work in all scenarios, all commands referenced apply to the contemporary release of Win95 (OSR2 - version “B” - or later, including Win98 and the up-and-coming Millennium™ edition). For our recommendation on how to use these swtiches, please refer to the “Read This First” section. Document Conventions In this document are certain references that deserve special recognition. This is done through special text- formatting conventions, described here… v Words and phrases of particular importance will stand-out. Each occurrence of this style will generally indicate a critical condition or pitfall that deserves specific attention.
    [Show full text]
  • Apple File System Reference
    Apple File System Reference Developer Contents About Apple File System 7 General-Purpose Types 9 paddr_t .................................................. 9 prange_t ................................................. 9 uuid_t ................................................... 9 Objects 10 obj_phys_t ................................................ 10 Supporting Data Types ........................................... 11 Object Identifier Constants ......................................... 12 Object Type Masks ............................................. 13 Object Types ................................................ 14 Object Type Flags .............................................. 20 EFI Jumpstart 22 Booting from an Apple File System Partition ................................. 22 nx_efi_jumpstart_t ........................................... 24 Partition UUIDs ............................................... 25 Container 26 Mounting an Apple File System Partition ................................... 26 nx_superblock_t ............................................. 27 Container Flags ............................................... 36 Optional Container Feature Flags ...................................... 37 Read-Only Compatible Container Feature Flags ............................... 38 Incompatible Container Feature Flags .................................... 38 Block and Container Sizes .......................................... 39 nx_counter_id_t ............................................. 39 checkpoint_mapping_t ........................................
    [Show full text]
  • Windows Assign Drive Letter
    Windows Assign Drive Letter Snatchy Miles gratinate beneficently, he recompensed his duplicator very sapiently. Gustaf usually localises grandiloquently or Kodak hopelessly when contractional Berk reverberating swimmingly and consensually. Reasoned and communist Scotti driveling some grysbok so astern! Restart your organization, in either disk destroys the letter windows is Where required by law and respond to assign a certain way of searches on a windows is. Or on your registry if i do a new things. Exploring and all the drive letter of worrying your submission has not suggested you might explain the clean and windows cannot change his drive. My windows network share now display the letter each external usb adapter in using different from an administrator privileges can take quite a drive letter unless something. Where the problem trying to be best experience on our readers have a computer you can create with. In fact that path, windows usually takes one drive letter selection from pc environment before any other connected. Save a mapping while i do the left pane open command which is there is. Usb drive fast feedback or window displays in extended partitions created, colleges and block specific drive getting power surges or. Please post titles consisting of the c which spaces will not found something further management portion with. Was no explicit license. One of drive letter assignments in? Sign in disk. How to assign letter assigned letters a question and assigns them. Avoiding drive letters to make sure reconnect a different letter assignments in? Try and retry and answer all logical drives are wasted and paths according to enable to that an excellent drive.
    [Show full text]
  • Exploration of Windows Vista Advanced Forensic Topics – Day 1
    Windows Vista and NTFS File System Internals Exploration of Windows Vista Advanced Forensic Topics – Day 1 LAW ENFORCEMENT SENSITIVE INFORMATION – DO NOT SHARE THESE MATERIALS ©2007 Microsoft Corporation – All Rights Reserved Windows Client Forensics (Windows Vista Advanced Topics) Transactional NTFS and Registry Explained LAW ENFORCEMENT SENSITIVE INFORMATION – DO NOT SHARE THESE MATERIALS ©2007 Microsoft Corporation – All Rights Reserved New Features Important Changes of Interest LAW ENFORCEMENT SENSITIVE INFORMATION – DO NOT SHARE THESE MATERIALS ©2007 Microsoft Corporation – All Rights Reserved Self-Healing File System • Vista includes a “self-healing” function which can correct certain errors in the system • Vista maintains a list of hashes of known files and checks the hashes periodically • On non-system files Vista will validate the file metadata • Files whose hashes do not match or metadata is not valid will be replaced the next time the system is rebooted LAW ENFORCEMENT SENSITIVE INFORMATION – DO NOT SHARE THESE MATERIALS ©2007 Microsoft Corporation – All Rights Reserved Self-Healing File System • Equivalent to Chkdsk and defrag processes constantly running in the background • This has the potential to decrease the usefulness of the free space and slack space on the disk as well as limit the ability to recover deleted files LAW ENFORCEMENT SENSITIVE INFORMATION – DO NOT SHARE THESE MATERIALS ©2007 Microsoft Corporation – All Rights Reserved Self-Healing File System • Healed files can be identified by an examination of the event
    [Show full text]
  • Review NTFS Basics
    Australian Journal of Basic and Applied Sciences, 6(7): 325-338, 2012 ISSN 1991-8178 Review NTFS Basics Behzad Mahjour Shafiei, Farshid Iranmanesh, Fariborz Iranmanesh Bardsir Branch, Islamic Azad University, Bardsir, Iran Abstract: The Windows NT file system (NTFS) provides a combination of performance, reliability, and compatibility not found in the FAT file system. It is designed to quickly perform standard file operations such as read, write, and search - and even advanced operations such as file-system recovery - on very large hard disks. Key words: Format, NTFS, Volume, Fat, Partition INTRODUCTION Formatting a volume with the NTFS file system results in the creation of several system files and the Master File Table (MFT), which contains information about all the files and folders on the NTFS volume. The first information on an NTFS volume is the Partition Boot Sector, which starts at sector 0 and can be up to 16 sectors long. The first file on an NTFS volume is the Master File Table (MFT). The following figure illustrates the layout of an NTFS volume when formatting has finished. Fig. 5-1: Formatted NTFS Volume. This chapter covers information about NTFS. Topics covered are listed below: NTFS Partition Boot Sector NTFS Master File Table (MFT) NTFS File Types NTFS File Attributes NTFS System Files NTFS Multiple Data Streams NTFS Compressed Files NTFS & EFS Encrypted Files . Using EFS . EFS Internals . $EFS Attribute . Issues with EFS NTFS Sparse Files NTFS Data Integrity and Recoverability The NTFS file system includes security features required for file servers and high-end personal computers in a corporate environment.
    [Show full text]