File Systems and Archives

Total Page:16

File Type:pdf, Size:1020Kb

File Systems and Archives DOE HPC Best Practices Workshop File Systems and Archives SEPTEMBER 26-27, 2011 • SAN FRANCISCO, CA The Fifth Workshop on HPC Best Practices: File Systems and Archives Held September 26–27, 2011, San Francisco Workshop Report December 2011 Compiled by Jason Hick, John Hules, and Andrew Uselton Lawrence Berkeley National Laboratory Workshop Steering Committee John Bent (LANL); Jeff Broughton (LBNL/NERSC); Shane Canon (LBNL/NERSC); Susan Coghlan (ANL); David Cowley (PNNL); Mark Gary (LLNL); Gary Grider (LANL); Kevin Harms (ANL/ALCF); Paul Henning, DOE Co-Host (NNSA); Jason Hick, Workshop Chair (LBNL/NERSC); Ruth Klundt (SNL); Steve Monk (SNL); John Noe (SNL); Lucy Nowell (ASCR); Sarp Oral (ORNL); James Rogers (ORNL); Yukiko Sekine, DOE Co-Host (ASCR); Jerry Shoopman (LLNL); Aaron Torres (LANL); Andrew Uselton, Assistant Chair (LBNL/NERSC); Lee Ward (SNL); Dick Watson (LLNL). Workshop Group Chairs Shane Canon (LBNL); Susan Coghlan (ANL); David Cowley (PNNL); Mark Gary (LLNL); John Noe (SNL); Sarp Oral (ORNL); Jim Rogers (ORNL); Jerry Shoopman (LLNL). Ernest Orlando Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley, CA 94720-8148 This work was supported by the Director, Office of Science, Office of Advanced Scientific Computing Research of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. DISCLAIMER This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor The Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or The Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof or The Regents of the University of California. Contents Executive Summary...................................................................................................................................... 3 Introduction................................................................................................................................................... 5 Workshop Goals........................................................................................................................................ 5 Workshop Format ..................................................................................................................................... 5 Workshop Breakout Topics ...................................................................................................................... 5 The Business of Storage Systems ......................................................................................................... 5 The Administration of Storage Systems ............................................................................................... 6 The Reliability and Availability of Storage Systems............................................................................ 7 The Usability of Storage Systems......................................................................................................... 7 Workshop Findings....................................................................................................................................... 9 Best Practices............................................................................................................................................ 9 Gaps and Recommendations................................................................................................................... 13 Appendix A: Position Papers...................................................................................................................... 16 Appendix B: Workshop Agenda................................................................................................................. 17 Appendix C: Workshop Attendees ............................................................................................................. 21 Workshop on HPC Best Practices: File Systems and Archives 1 Executive Summary In 2006, the United States Department of Energy (DOE) National Nuclear Security Administration (NNSA) and Office of Science (SC) identified the need for large supercomputer centers to learn from each other on what worked well and what didn’t in the areas of acquisition, installation, integration, testing and operation. Previous High Performance Computing (HPC) Best Practice Workshops focused on System Integration in 2007 (http://outreach.scidac.gov/pibp/), Risk Management in 2008 (http://rmtap.llnl.gov/), Software Lifecycles in 2009 (http://outreach.scidac.gov/swbp/), and Power Management in 2010 (http://outreach.scidac.gov/pmbp/). The workshop on HPC Best Practices on File Systems and Archives was the fifth in the series. The workshop gathered technical and management experts for operations of HPC file systems and archives. Attendees identified and discussed best practices in use at their facilities, and documented findings for the DOE and HPC community in this report. The home page for this workshop is http://outreach.scidac.gov/fsbp/. Prior to the workshop, board members identified four critical functionally components to the operation of storage systems: administration, business, reliability, and usability of storage. These components of storage became the breakout session topics. During registration, attendees designated their primary interests for breakout sessions and submitted a position paper to aid in having discussions around proposals or ideas in the position papers provided. Participants submitted 27 position papers, included in Appendix A. This year the workshop had 64 attendees from 29 different sites from around the world. Participants identified 16 best practices in use: 1. For critical data, multiple copies should be made on multiple species of hardware in geographically separated locations. 2. Build a multidisciplinary data center team. 3. Having dedicated maintenance personnel, vendor or internal staff, is important to increasing system availability. 4. Establish and maintain hot-spare, testbed, and pre-production environments. 5. Establish systematic and ongoing procedures to measure and monitor system behavior and performance. 6. Establish authoritative sources of information. 7. Manage users’ consumption of space. 8. Storage systems should function independently from compute systems. 9. Include middleware benchmarks and performance criteria in system procurement contracts. 10. Deploy and support tools to enable users to more easily achieve the full capabilities of file systems and archives. 11. Training, onsite or online, and meeting with the users directly improves the utilization and performance of the storage system. 12. Actively manage relationships with the storage industry and storage community. 13. Monitor and exploit emerging storage technologies. 14. Continue to re-evaluate the real characteristics of your I/O workload. 15. Use quantitative data to demonstrate the importance of storage to achieving scientific results. 16. Retain original logs generated by systems and software. Workshop on HPC Best Practices: File Systems and Archives 3 Many participants do not implement all the best practices, which means they are bringing several new ideas for improvement back to their sites. In addition, attendees identified 15 gaps that require further attention. Gaps likely addressed by evolutionary solutions include: 1. Incomplete attention to end-to-end data integrity. 2. Storage system software is not resilient enough. 3. Redundant Array of Independent Tape (RAIT) does not exist today. 4. Provide monitoring and diagnostic tools that allow users to understand file system configuration and performance characteristics and troubleshoot poor I/O performance. 5. Communication between storage systems users and storage system experts needs improvement. 6. Need tools and mechanisms to capture provenance and provide life-cycle management for data. 7. Ensure storage systems are prepared to support data-intensive computing and new workflows. 8. Measure, and track trends in storage system usage to the same degree we measure and track flops and cycles on computational systems. 9. Tools to provide scalable performance in interacting with files in the storage system. Gaps requiring revolutionary approaches include: 1. Need quality of service for users in a shared storage environment. 2. Need standard metadata for users to specify data importance and retention. 3. The diagnostic information currently available in today’s storage systems is woefully inadequate. 4. Metadata performance in storage systems already limits usability and won’t meet the needs of exascale. 5. POSIX isn’t expressive enough for the breadth of application I/O patterns. 6. There exists a storage technology gap for exascale. 4 Workshop on HPC Best Practices: File Systems and Archives Introduction The U.S. Department of Energy (DOE)
Recommended publications
  • Python Default File Format for Download Download Files with Progress in Python
    python default file format for download Download files with progress in Python. This is a coding tip article. I will show you how to download files with progress in Python. The sauce here is to make use of the wget module. First, install the module into the environment. The wget module is pretty straight-forward, only one function to access, wget.download() . Let say we want to download this file http://download.geonames.org/export/zip/US.zip, the implementation will be following: The output will look like this: As you can see, it prints out the progress bar for us to know the progress of downloading, with current bytes retrieved with total bytes. The s econd parameter is to set output filename or directory for output file. There is another parameter, bar=callback(current, total, width=80) . This is to define how the progress bar is rendered to output screen. Canonical specification. The canonical version of the wheel format specification is now maintained at https://packaging.python.org/specifications/binary-distribution-format/ . This may contain amendments relative to this PEP. Abstract. This PEP describes a built-package format for Python called "wheel". A wheel is a ZIP-format archive with a specially formatted file name and the .whl extension. It contains a single distribution nearly as it would be installed according to PEP 376 with a particular installation scheme. Although a specialized installer is recommended, a wheel file may be installed by simply unpacking into site-packages with the standard 'unzip' tool while preserving enough information to spread its contents out onto their final paths at any later time.
    [Show full text]
  • Conda-Build Documentation Release 3.21.5+15.G174ed200.Dirty
    conda-build Documentation Release 3.21.5+15.g174ed200.dirty Anaconda, Inc. Sep 27, 2021 CONTENTS 1 Installing and updating conda-build3 2 Concepts 5 3 User guide 17 4 Resources 49 5 Release notes 115 Index 127 i ii conda-build Documentation, Release 3.21.5+15.g174ed200.dirty Conda-build contains commands and tools to use conda to build your own packages. It also provides helpful tools to constrain or pin versions in recipes. Building a conda package requires installing conda-build and creating a conda recipe. You then use the conda build command to build the conda package from the conda recipe. You can build conda packages from a variety of source code projects, most notably Python. For help packing a Python project, see the Setuptools documentation. OPTIONAL: If you are planning to upload your packages to Anaconda Cloud, you will need an Anaconda Cloud account and client. CONTENTS 1 conda-build Documentation, Release 3.21.5+15.g174ed200.dirty 2 CONTENTS CHAPTER ONE INSTALLING AND UPDATING CONDA-BUILD To enable building conda packages: • install conda • install conda-build • update conda and conda-build 1.1 Installing conda-build To install conda-build, in your terminal window or an Anaconda Prompt, run: conda install conda-build 1.2 Updating conda and conda-build Keep your versions of conda and conda-build up to date to take advantage of bug fixes and new features. To update conda and conda-build, in your terminal window or an Anaconda Prompt, run: conda update conda conda update conda-build For release notes, see the conda-build GitHub page.
    [Show full text]
  • Archive and Compressed [Edit]
    Archive and compressed [edit] Main article: List of archive formats • .?Q? – files compressed by the SQ program • 7z – 7-Zip compressed file • AAC – Advanced Audio Coding • ace – ACE compressed file • ALZ – ALZip compressed file • APK – Applications installable on Android • AT3 – Sony's UMD Data compression • .bke – BackupEarth.com Data compression • ARC • ARJ – ARJ compressed file • BA – Scifer Archive (.ba), Scifer External Archive Type • big – Special file compression format used by Electronic Arts for compressing the data for many of EA's games • BIK (.bik) – Bink Video file. A video compression system developed by RAD Game Tools • BKF (.bkf) – Microsoft backup created by NTBACKUP.EXE • bzip2 – (.bz2) • bld - Skyscraper Simulator Building • c4 – JEDMICS image files, a DOD system • cab – Microsoft Cabinet • cals – JEDMICS image files, a DOD system • cpt/sea – Compact Pro (Macintosh) • DAA – Closed-format, Windows-only compressed disk image • deb – Debian Linux install package • DMG – an Apple compressed/encrypted format • DDZ – a file which can only be used by the "daydreamer engine" created by "fever-dreamer", a program similar to RAGS, it's mainly used to make somewhat short games. • DPE – Package of AVE documents made with Aquafadas digital publishing tools. • EEA – An encrypted CAB, ostensibly for protecting email attachments • .egg – Alzip Egg Edition compressed file • EGT (.egt) – EGT Universal Document also used to create compressed cabinet files replaces .ecab • ECAB (.ECAB, .ezip) – EGT Compressed Folder used in advanced systems to compress entire system folders, replaced by EGT Universal Document • ESS (.ess) – EGT SmartSense File, detects files compressed using the EGT compression system. • GHO (.gho, .ghs) – Norton Ghost • gzip (.gz) – Compressed file • IPG (.ipg) – Format in which Apple Inc.
    [Show full text]
  • Gbr, Nzl//20291123
    TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20291123 HSShttnoi oioio JO 01 DERIVED FROM: NS> TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL//20291123 TOP S EC RET//COMINT//RELTO USA, AUS, CAN, GBR, NZL Agenda • Overview of how FFU's work and what the raw data looks like in XKS • Targets use of FFU's • How to exploit in XKS • HTTP Activity Search • (new) Web File Transfer Search TOP SECRET//COMINT//RELTO USA, AUS, CAN, GBR, NZL TOP SECRET//COMINT//RELTO USA, AUS, CAN, GBR, NZL What is an FFU? • A free file uploader is a website that allocs you to upload a file and then hosts that file for others to download. • Think of the "dropbox" service that we have on NSAnet. • Since Free File Upoaders are web-based, the HTTP Activity plug-in will be the first place to look for activity • We'll also introduce the Web File Transfer plug-in TOP SECRET//COMINT//RELTO USA, AUS, CAN, GBR, NZL TO P S EC RET//COMINT//REL TO USA, AUS, CAN, GBR, NZL "Free" part of FFU | • Most FFU sites are free and don't require accounts, but only allow for basic service • For example, files might only stored for a short period of time I • Or the person who uploads it does not have a lot of access into who has downloaded their files and how many times TOP SECRET//COMINT//RELTO USA, AUS, CAN, GBR, NZL TOP S EC RET//COMINT//RELTO USA, AUS, CAN, GBR, NZL "Premium" accounts for FFU Some FFU sites allow for "premium" access, maybe just by registering or maybe by charging the user a fee Premium access might allow for more uploads per account, or files that can be stored longer Some premium accounts give the uploader "admin" insight into how many times a given file was downloaded (commonly referred to as a "counter").
    [Show full text]
  • Ten Thousand Security Pitfalls: the ZIP File Format
    Ten thousand security pitfalls: The ZIP file format. Gynvael Coldwind @ Technische Hochschule Ingolstadt, 2018 About your presenter (among other things) All opinions expressed during this presentation are mine and mine alone, and not those of my barber, my accountant or my employer. What's on the menu Also featuring: Steganograph 1. What's ZIP used for again? 2. What can be stored in a ZIP? y a. Also, file names 3. ZIP format 101 and format repair 4. Legacy ZIP encryption 5. ZIP format and multiple personalities 6. ZIP encryption and CRC32 7. Miscellaneous, i.e. all the things not mentioned so far. Or actually, hacking a "secure cloud disk" website. EDITORIAL NOTE Everything in this color is a quote from the official ZIP specification by PKWARE Inc. The specification is commonly known as APPNOTE.TXT https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT Cyber Secure CloudDisk Where is ZIP used? .zip files, obviously Default ZIP file icon from Microsoft Windows 10's Explorer And also... Open Packaging Conventions: .3mf, .dwfx, .cddx, .familyx, .fdix, .appv, .semblio, .vsix, .vsdx, .appx, .appxbundle, .cspkg, .xps, .nupkg, .oxps, .jtx, .slx, .smpk, .odt, .odp, .ods, ... .scdoc, (OpenDocument) and Offixe Open XML formats: .docx, .pptx, .xlsx https://en.wikipedia.org/wiki/Open_Packaging_Conventions And also... .war (Web application archive) .rar (not THAT .rar) .jar (resource adapter archive) (Java Archive) .ear (enterprise archive) .sar (service archive) .par (Plan Archive) .kar (Karaf ARchive) https://en.wikipedia.org/wiki/JAR_(file_format)
    [Show full text]
  • Company Vendor ID (Decimal Format) (AVL) Ditest Fahrzeugdiagnose Gmbh 4621 @Pos.Com 3765 0XF8 Limited 10737 1MORE INC
    Vendor ID Company (Decimal Format) (AVL) DiTEST Fahrzeugdiagnose GmbH 4621 @pos.com 3765 0XF8 Limited 10737 1MORE INC. 12048 360fly, Inc. 11161 3C TEK CORP. 9397 3D Imaging & Simulations Corp. (3DISC) 11190 3D Systems Corporation 10632 3DRUDDER 11770 3eYamaichi Electronics Co., Ltd. 8709 3M Cogent, Inc. 7717 3M Scott 8463 3T B.V. 11721 4iiii Innovations Inc. 10009 4Links Limited 10728 4MOD Technology 10244 64seconds, Inc. 12215 77 Elektronika Kft. 11175 89 North, Inc. 12070 Shenzhen 8Bitdo Tech Co., Ltd. 11720 90meter Solutions, Inc. 12086 A‐FOUR TECH CO., LTD. 2522 A‐One Co., Ltd. 10116 A‐Tec Subsystem, Inc. 2164 A‐VEKT K.K. 11459 A. Eberle GmbH & Co. KG 6910 a.tron3d GmbH 9965 A&T Corporation 11849 Aaronia AG 12146 abatec group AG 10371 ABB India Limited 11250 ABILITY ENTERPRISE CO., LTD. 5145 Abionic SA 12412 AbleNet Inc. 8262 Ableton AG 10626 ABOV Semiconductor Co., Ltd. 6697 Absolute USA 10972 AcBel Polytech Inc. 12335 Access Network Technology Limited 10568 ACCUCOMM, INC. 10219 Accumetrics Associates, Inc. 10392 Accusys, Inc. 5055 Ace Karaoke Corp. 8799 ACELLA 8758 Acer, Inc. 1282 Aces Electronics Co., Ltd. 7347 Aclima Inc. 10273 ACON, Advanced‐Connectek, Inc. 1314 Acoustic Arc Technology Holding Limited 12353 ACR Braendli & Voegeli AG 11152 Acromag Inc. 9855 Acroname Inc. 9471 Action Industries (M) SDN BHD 11715 Action Star Technology Co., Ltd. 2101 Actions Microelectronics Co., Ltd. 7649 Actions Semiconductor Co., Ltd. 4310 Active Mind Technology 10505 Qorvo, Inc 11744 Activision 5168 Acute Technology Inc. 10876 Adam Tech 5437 Adapt‐IP Company 10990 Adaptertek Technology Co., Ltd. 11329 ADATA Technology Co., Ltd.
    [Show full text]
  • T E C H F O R G O
    WINTER WINTER 2 018 Periodicals postage paid at 2 018 Andover, MA and additional mailing oces Phillips Academy, Andover, Massachuses 01810-4161 Households that receive more than one Andover magazine are encouraged to call 978-749-4267 to discontinue extra copies. TECH PHILLIPS ACADEMY SUMMER SESSION July 3–August 5, 2018 F OR G Introduce your child to a whole new world OOD of academic and cultural enrichment this summer. Learn more at www.andover.edu/summer Discover the Knowledge & Goodness campaign on page 16 WINTER WINTER 2 018 Periodicals postage paid at 2 018 Andover, MA and additional mailing oces Phillips Academy, Andover, Massachuses 01810-4161 Households that receive more than one Andover magazine are encouraged to call 978-749-4267 to discontinue extra copies. TECH PHILLIPS ACADEMY SUMMER SESSION July 3–August 5, 2018 F OR G Introduce your child to a whole new world OOD of academic and cultural enrichment this summer. Learn more at www.andover.edu/summer Discover the Knowledge & Goodness campaign on page 16 WINTER WINTER 2 018 Periodicals postage paid at 2 018 Andover, MA and additional mailing oces Phillips Academy, Andover, Massachuses 01810-4161 Households that receive more than one Andover magazine are encouraged to call 978-749-4267 to discontinue extra copies. TECH PHILLIPS ACADEMY SUMMER SESSION July 3–August 5, 2018 F OR G Introduce your child to a whole new world OOD of academic and cultural enrichment this summer. Learn more at www.andover.edu/summer Discover the Knowledge & Goodness campaign on page 16 WINTER 2 018
    [Show full text]
  • Wheel Documentation Release 0.37.0
    wheel Documentation Release 0.37.0 Daniel Holth Aug 18, 2021 Contents 1 Quickstart 3 2 Installation 5 2.1 Python and OS Compatibility......................................5 3 User Guide 7 3.1 Building Wheels.............................................7 3.2 Including license files in the generated wheel file............................7 3.3 Converting Eggs to Wheels.......................................8 3.4 Installing Wheels.............................................8 4 Reference Guide 9 4.1 wheel convert...............................................9 4.2 wheel unpack............................................... 10 4.3 wheel pack................................................ 11 5 Development 13 5.1 Pull Requests............................................... 13 5.2 Automated Testing............................................ 13 5.3 Running Tests Locally.......................................... 14 5.4 Getting Involved............................................. 14 5.5 Release Process............................................. 14 6 Release Notes 15 Index 23 i ii wheel Documentation, Release 0.37.0 User list| Dev list| GitHub| PyPI | User IRC: #pypa | Dev IRC: #pypa-dev This library is the reference implementation of the Python wheel packaging standard, as defined in PEP 427. It has two different roles: 1.A setuptools extension for building wheels that provides the bdist_wheel setuptools command 2. A command line tool for working with wheel files Contents 1 wheel Documentation, Release 0.37.0 2 Contents CHAPTER 1 Quickstart To build a wheel for your setuptools based project: python setup.py bdist_wheel The wheel will go to dist/yourproject-<tags>.whl. If you want to make universal (Python 2/3 compatible, pure Python) wheels, add the following section to your setup. cfg: [bdist_wheel] universal=1 To convert an .egg or file to a wheel: wheel convert youreggfile.egg Similarly, to convert a Windows installer (made using python setup.py bdist_wininst) to a wheel: wheel convert yourinstaller.exe 3 wheel Documentation, Release 0.37.0 4 Chapter 1.
    [Show full text]
  • Company Vendor ID (Decimal Format) (AVL) Ditest Fahrzeugdiagnose Gmbh 4621 @Pos.Com 3765 01Db-Stell 3151 0XF8 Limited 10737 103M
    Vendor ID Company (Decimal Format) (AVL) DiTEST Fahrzeugdiagnose GmbH 4621 @pos.com 3765 01dB-Stell 3151 0XF8 Limited 10737 103mm Tech 8168 1064138 Ontario Ltd. O/A UNI-TEC ELECTRONICS 8219 11 WAVE TECHNOLOGY, INC. 4375 1417188 Ontario Ltd. 4835 1C Company 5288 1MORE INC. 12048 2D Debus & Diebold Messsysteme GmbH 8539 2L international B.V. 4048 2N TELEKOMUNIKACE a.s. 7303 2-Tel B.V. 2110 2WCOM GmbH 7343 2Wire, Inc 2248 360 Electrical, LLC 12686 360 Service Agency GmbH 12930 360fly, Inc. 11161 3Brain GmbH 9818 3C TEK CORP. 9397 3Cam Technology, Inc 1928 3Com Corporation 1286 3D CONNEXION SAM 9583 3D Imaging & Simulations Corp. (3DISC) 11190 3D INNOVATIONS, LLC 7907 3D Robotics Inc. 9900 3D Systems Corporation 10632 3D Technologies Ltd 12655 3DM Devices Inc 2982 3DRUDDER 11770 3DSP 7513 3DV Systems Ltd. 6963 3eYamaichi Electronics Co., Ltd. 8709 3i Corporation 9806 3i techs Development Corp 4263 3layer Engineering 7123 3M Canada 2200 3M CMD (Communication Markets Division) 7723 3M Cogent, Inc. 7717 3M Germany 2597 3M Home Health Systems 2166 3M Library Systems 3372 3M Scott 8463 3M Touch Systems 1430 3Pea Technologies, Inc. 3637 3Shape A/S 6303 3T B.V. 11721 4G Systems GmbH 6485 4iiii Innovations Inc. 10009 4Links Limited 10728 4MOD Technology 10244 64seconds, Inc. 12215 77 Elektronika Kft. 11175 8086 Consultancy 12657 89 North, Inc. 12070 8D TECHNOLOGIES INC. 8845 8devices 9599 90meter Solutions, Inc. 12086 A & G Souzioni Digitali 4757 A & R Cambridge Ltd. 9668 A C S Co., Ltd. 9454 A Global Partner Corporation 3689 A W Electronics, Inc.
    [Show full text]
  • Hacky Easter 2017 Summary
    Hacky Easter 2018 2 Summary PS, www.hacking-lab.com Table of Contents Intro .............................................................................................................................. 5 Outro .................................................................................................................................................................... 5 Volunteers ........................................................................................................................................................... 5 Credits .................................................................................................................................................................. 5 Awards .......................................................................................................................... 6 Perfect Solvers ................................................................................................................................................... 6 Hacking-Lab Awards ......................................................................................................................................... 7 Statistics ...................................................................................................................... 8 General ................................................................................................................................................................ 8 Event Activity .....................................................................................................................................................
    [Show full text]
  • Resnick Distributors 800-828-3865 Gran N Go Product No Description U/M Case Pack Retail 50939 GEHL CHILI SAUCE INBOX 80Z! 4
    Resnick Distributors Gran n Go 800-828-3865 Product No Description U/M Case Pack Retail 50939 GEHL CHILI SAUCE INBOX 80Z! 4 1 0 55207 GEHL CHSE CHEDDAR BAGINBOX 80! 4 1 0 58071 GEHL SAUCE INBAG CHEDDAR 60z ! 6 1 58072 GEHL SAUCE INBAG CHILI 60z ! 6 1 58070 GEHL SAUCE INBAG JALAPENO 60z! 6 1 50943 GEHL TORTILLA CHIPS IN TRAY3Z! 30 1 0 54214 BAGEL BITES BCN EGG CHS 7Z ! 1 8 52462 BAGEL BITES CHS PEPPERONI 7Z ! 1 8 3.29 51280 BANQUET CHICKEN FNGR MEAL 6z ! 1 12 51200 BANQUET CHICKEN POT PIE 7Z ! 1 24 1.39 53487 EL MONTERY BURRITO BF/BN 10oz! 1 12 1.99 53488 EL MONTERY BURRITO SPICY 10z! 1 12 1.99 51222 FRESCHETTA 4CHS PIZZA 26.11Z ! 1 14 10.79 53489 HOT POCKET CHK QUESADILLAS 8z! 1 12 2.69 53476 HOT POCKETS HAM & CHEESE 8Z ! 1 12 2.69 53477 HOT POCKETS MEATBALL MOZZ 8Z ! 1 12 2.69 53479 HOT POCKETS PEPP PIZZA 8Z ! 1 12 2.69 53478 HOT POCKETS PHILLY CHSTEAK 8Z! 1 12 2.69 51159 RED BARON PIZZA 12 INCH 21.8Z! 1 16 6.89 52412 REDDI-WHIP TOPPING ORIG 6.5Z ! 1 12 3.89 52463 SMART ONES 3CHS ZITI MARI 9Z ! 1 12 53504 STOUFFER FR BR PEPPERONI PIZZ! 1 10 4.39 51182 STOUFFER MACARONI/CHEESE 12Z ! 1 12 3.39 52211 STOUFFERS SS LASAGNE 10.5Z ! 1 12 4.29 51252 TGI FROZEN FRID BUFF WINGS10Z! 1 8 5.09 51255 TGI FROZEN FRID MOZZ STIX11Z! 1 8 5.29 51253 TGI FROZEN FRID POT SKINS 8Z! 1 8 5.09 58811 BISCUIT MELT BCN EG CHS 4.8z ! 12 1 58809 BISCUIT MELT SAUS EG CHS 4.8z! 12 1 55010 BOMB BURRITO 17180 BF BEAN 14! 12 1 2.89 55012 BOMB BURRITO 17182 BF*RED*CHL! 12 1 2.89 55066 DELI EX 2417 ITALIAN WRAP 9.1Z 8 1 5.19 55060 DELI EX 2459 MG TURK & CHS 8.4 8 1 5.19
    [Show full text]
  • Semantic Relatedness Using Web Data
    Master thesis - University of Groningen Faculty of Mathematics and Natural Sciences February 2013 Semantic relatedness using web data The comparison of concepts using search results FERNAND GEERTSEMA SEMANTICRELATEDNESSUSINGWEBDATA the comparison of concepts using search results fernand geertsema Master thesis Computing Science – Faculty of Mathematics and Natural Sciences University of Groningen February 2013 – version 1.2 Fernand Geertsema: Semantic relatedness using web data, The compari- son of concepts using search results, February 2013 supervisors: Mathijs Homminga Alexander Lazovik Michael Wilkinson ABSTRACT Investigating the existence of relations between people is the starting point of this research. Previous scientific research focussed on rela- tions between general concepts in lexical databases. Web data was only part of the periphery of scientific research. Due to the impor- tant role of web data in determining relations between people further research into relatedness between general concepts in web data is needed. For handling the different contexts of general concepts in web data for calculating semantic relatedness three different algorithms are used. The Normalized Compression Distance searches for overlap- ping pieces of text in web pages to calculate semantic relatedness. The Jaccard index on keywords uses text annotation to find keywords in texts and uses these keywords to calculate an overlap between them. The Normalized Web Distance uses the co-occurrence of concepts to calculate their semantic relatedness. These approaches are tested with the use of the WordSimilarity-353 test collection. This dataset consists of 353 different concepts pairs with a human assigned relatedness score. The concepts in this collec- tion are the input for gathering web pages from Google, Wikipedia and IMDb.
    [Show full text]