Exploring Petabytes of Space-Time Data
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
What If What I Need Is Not in Powerai (Yet)? What You Need to Know to Build from Scratch?
IBM Systems What if what I need is not in PowerAI (yet)? What you need to know to build from scratch? Jean-Armand Broyelle June 2018 IBM Systems – Cognitive Era Things to consider when you have to rebuild a framework © 2017 International Business Machines Corporation 2 IBM Systems – Cognitive Era CUDA Downloads © 2017 International Business Machines Corporation 3 IBM Systems – Cognitive Era CUDA 8 – under Legacy Releases © 2017 International Business Machines Corporation 4 IBM Systems – Cognitive Era CUDA 8 Install Steps © 2017 International Business Machines Corporation 5 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 6 IBM Systems – Cognitive Era cuDNN v6.0 for CUDA 8.0 © 2017 International Business Machines Corporation 7 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 8 IBM Systems – Cognitive Era © 2017 International Business Machines Corporation 9 IBM Systems – Cognitive Era © 2017 International Business Machines Corporation 10 IBM Systems – Cognitive Era cuDNN and NVIDIA drivers © 2017 International Business Machines Corporation 11 IBM Systems – Cognitive Era Prepare your environment • When something goes wrong it’s better to Remove local anaconda installation $ cd ~; rm –rf anaconda2 .conda • Reinstall anaconda $ cd /tmp; wget https://repo.anaconda.com/archive/Anaconda2-5.1.0-Linux- ppc64le.sh $ bash /tmp/Anaconda2-5.1.0-Linux-ppc64le.sh • Activate PowerAI $ source /opt/DL/tensorflow/bin/tensorflow-activate • When you -
Open Source Copyrights
Kuri App - Open Source Copyrights: 001_talker_listener-master_2015-03-02 ===================================== Source Code can be found at: https://github.com/awesomebytes/python_profiling_tutorial_with_ros 001_talker_listener-master_2016-03-22 ===================================== Source Code can be found at: https://github.com/ashfaqfarooqui/ROSTutorials acl_2.2.52-1_amd64.deb ====================== Licensed under GPL 2.0 License terms can be found at: http://savannah.nongnu.org/projects/acl/ acl_2.2.52-1_i386.deb ===================== Licensed under LGPL 2.1 License terms can be found at: http://metadata.ftp- master.debian.org/changelogs/main/a/acl/acl_2.2.51-8_copyright actionlib-1.11.2 ================ Licensed under BSD Source Code can be found at: https://github.com/ros/actionlib License terms can be found at: http://wiki.ros.org/actionlib actionlib-common-1.5.4 ====================== Licensed under BSD Source Code can be found at: https://github.com/ros-windows/actionlib License terms can be found at: http://wiki.ros.org/actionlib adduser_3.113+nmu3ubuntu3_all.deb ================================= Licensed under GPL 2.0 License terms can be found at: http://mirrors.kernel.org/ubuntu/pool/main/a/adduser/adduser_3.113+nmu3ubuntu3_all. deb alsa-base_1.0.25+dfsg-0ubuntu4_all.deb ====================================== Licensed under GPL 2.0 License terms can be found at: http://mirrors.kernel.org/ubuntu/pool/main/a/alsa- driver/alsa-base_1.0.25+dfsg-0ubuntu4_all.deb alsa-utils_1.0.27.2-1ubuntu2_amd64.deb ====================================== -
Mochi-JCST-01-20.Pdf
Ross R, Amvrosiadis G, Carns P et al. Mochi: Composing data services for high-performance computing environments. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 35(1): 121–144 Jan. 2020. DOI 10.1007/s11390-020-9802-0 Mochi: Composing Data Services for High-Performance Computing Environments Robert B. Ross1, George Amvrosiadis2, Philip Carns1, Charles D. Cranor2, Matthieu Dorier1, Kevin Harms1 Greg Ganger2, Garth Gibson3, Samuel K. Gutierrez4, Robert Latham1, Bob Robey4, Dana Robinson5 Bradley Settlemyer4, Galen Shipman4, Shane Snyder1, Jerome Soumagne5, and Qing Zheng2 1Argonne National Laboratory, Lemont, IL 60439, U.S.A. 2Parallel Data Laboratory, Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A. 3Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada 4Los Alamos National Laboratory, Los Alamos NM, U.S.A. 5The HDF Group, Champaign IL, U.S.A. E-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected] E-mail: [email protected]; [email protected]; [email protected]; [email protected] E-mail: [email protected]; [email protected]; [email protected]; {bws, gshipman}@lanl.gov E-mail: [email protected]; [email protected]; [email protected] Received July 1, 2019; revised November 2, 2019. Abstract Technology enhancements and the growing breadth of application workflows running on high-performance computing (HPC) platforms drive the development of new data services that provide high performance on these new platforms, provide capable and productive interfaces and abstractions for a variety of applications, and are readily adapted when new technologies are deployed. The Mochi framework enables composition of specialized distributed data services from a collection of connectable modules and subservices. -
Master Thesis
Master thesis To obtain a Master of Science Degree in Informatics and Communication Systems from the Merseburg University of Applied Sciences Subject: Tunisian truck license plate recognition using an Android Application based on Machine Learning as a detection tool Author: Supervisor: Achraf Boussaada Prof.Dr.-Ing. Rüdiger Klein Matr.-Nr.: 23542 Prof.Dr. Uwe Schröter Table of contents Chapter 1: Introduction ................................................................................................................................. 1 1.1 General Introduction: ................................................................................................................................... 1 1.2 Problem formulation: ................................................................................................................................... 1 1.3 Objective of Study: ........................................................................................................................................ 4 Chapter 2: Analysis ........................................................................................................................................ 4 2.1 Methodological approaches: ........................................................................................................................ 4 2.1.1 Actual approach: ................................................................................................................................... 4 2.1.2 Image Processing with OCR: ................................................................................................................ -
An Accuracy Examination of OCR Tools
International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue-9S4, July 2019 An Accuracy Examination of OCR Tools Jayesh Majumdar, Richa Gupta texts, pen computing, developing technologies for assisting Abstract—In this research paper, the authors have aimed to do a the visually impaired, making electronic images searchable comparative study of optical character recognition using of hard copies, defeating or evaluating the robustness of different open source OCR tools. Optical character recognition CAPTCHA. (OCR) method has been used in extracting the text from images. OCR has various applications which include extracting text from any document or image or involves just for reading and processing the text available in digital form. The accuracy of OCR can be dependent on text segmentation and pre-processing algorithms. Sometimes it is difficult to retrieve text from the image because of different size, style, orientation, a complex background of image etc. From vehicle number plate the authors tried to extract vehicle number by using various OCR tools like Tesseract, GOCR, Ocrad and Tensor flow. The authors in this research paper have tried to diagnose the best possible method for optical character recognition and have provided with a comparative analysis of their accuracy. Keywords— OCR tools; Orcad; GOCR; Tensorflow; Tesseract; I. INTRODUCTION Optical character recognition is a method with which text in images of handwritten documents, scripts, passport documents, invoices, vehicle number plate, bank statements, Fig.1: Functioning of OCR [2] computerized receipts, business cards, mail, printouts of static-data, any appropriate documentation or any II. OCR PROCDURE AND PROCESSING computerized receipts, business cards, mail, printouts of To improve the probability of successful processing of an static-data, any appropriate documentation or any picture image, the input image is often ‘pre-processed’; it may be with text in it gets processed and the text in the picture is de-skewed or despeckled. -
Enforcing Abstract Immutability
Enforcing Abstract Immutability by Jonathan Eyolfson A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Electrical and Computer Engineering Waterloo, Ontario, Canada, 2018 © Jonathan Eyolfson 2018 Examining Committee Membership The following served on the Examining Committee for this thesis. The decision of the Examining Committee is by majority vote. External Examiner Ana Milanova Associate Professor Rensselaer Polytechnic Institute Supervisor Patrick Lam Associate Professor University of Waterloo Internal Member Lin Tan Associate Professor University of Waterloo Internal Member Werner Dietl Assistant Professor University of Waterloo Internal-external Member Gregor Richards Assistant Professor University of Waterloo ii I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Abstract Researchers have recently proposed a number of systems for expressing, verifying, and inferring immutability declarations. These systems are often rigid, and do not support “abstract immutability”. An abstractly immutable object is an object o which is immutable from the point of view of any external methods. The C++ programming language is not rigid—it allows developers to express intent by adding immutability declarations to methods. Abstract immutability allows for performance improvements such as caching, even in the presence of writes to object fields. This dissertation presents a system to enforce abstract immutability. First, we explore abstract immutability in real-world systems. We found that developers often incorrectly use abstract immutability, perhaps because no programming language helps developers correctly implement abstract immutability. -
CSI: Inferring Mobile ABR Video Adaptation Behavior Under HTTPS and QUIC
CSI: Inferring Mobile ABR Video Adaptation Behavior under HTTPS and QUIC Shichang Xu Subhabrata Sen Z. Morley Mao University of Michigan AT&T Labs – Research University of Michigan Abstract Server Manifest Network Client Mobile video streaming services have widely adopted Adap- Chunks HTTP tive Bitrate (ABR) streaming to dynamically adapt the stream- Track ing quality to variable network conditions. A wide range of 720p 1 Buffer third-party entities such as network providers and testing 480p IP packets services need to understand such adaptation behavior for 360p 1 2 3 Index purposes such as QoE monitoring and network management. CSI The traditional approach involved conducting test runs and analyzing the HTTP-level information from the associated network traffic to understand the adaptation behavior under Figure 1. ABR streaming overview different network conditions. However, end-to-end traffic encryption protocols such as HTTPS and QUIC are being increasingly used by streaming services, hindering such tra- Rate (ABR) streaming (predominantly HLS [75] and DASH [31]) ditional traffic analysis approaches. has been widely adopted in industry for delivering satisfac- To address this, we develop CSI (Chunk Sequence Infer- tory Quality of Experience (QoE) over dynamic cellular net- encer), a general system that enables third-parties to conduct work conditions. The server encodes each video into multiple active measurements and infer mobile ABR video adapta- versions with different picture quality levels and encoding tion behavior based on packet size and timing information bitrates (with higher bitrates for higher-quality encodings) still available in the encrypted traffic. We perform exten- called tracks, and splits each track into shorter chunks, each sive evaluations and demonstrate that CSI achieves high representing a few seconds worth of playback content (Fig- inference accuracy for video encodings of popular streaming ure 1). -
A Dataset for Github Repository Deduplication
A Dataset for GitHub Repository Deduplication Diomidis Spinellis Audris Mockus Zoe Kotti [email protected] {dds,zoekotti}@aueb.gr University of Tennessee Athens University of Economics and Business ABSTRACT select distinct p1, p2 from( select project_commits.project_id as p2, GitHub projects can be easily replicated through the site’s fork first_value(project_commits.project_id) over( process or through a Git clone-push sequence. This is a problem for partition by commit_id empirical software engineering, because it can lead to skewed re- order by mean_metric desc) as p1 sults or mistrained machine learning models. We provide a dataset from project_commits of 10.6 million GitHub projects that are copies of others, and link inner join forkproj.all_project_mean_metric each record with the project’s ultimate parent. The ultimate par- on all_project_mean_metric.project_id = ents were derived from a ranking along six metrics. The related project_commits.project_id) as shared_commits projects were calculated as the connected components of an 18.2 where p1 != p2; million node and 12 million edge denoised graph created by direct- Listing 1: Identification of projects with common commits ing edges to ultimate parents. The graph was created by filtering out more than 30 hand-picked and 2.3 million pattern-matched GitHub contains many millions of copied projects. This is a prob- clumping projects. Projects that introduced unwanted clumping lem for empirical software engineering. First, when data contain- were identified by repeatedly visualizing shortest path distances ing multiple copies of a repository are analyzed, the results can between unrelated important projects. Our dataset identified 30 end up skewed [27]. Second, when such data are used to train thousand duplicate projects in an existing popular reference dataset machine learning models, the corresponding models can behave of 1.8 million projects. -
Expense Tracking Mobile Application with Receipt Scanning Functionality Bachelor’S Thesis
TALLINN UNIVERSITY OF TECHNOLOGY Faculty of Information Technology Department of Computer Science Chair of Network Software Expense tracking mobile application with receipt scanning functionality Bachelor’s thesis Student: Roman Kaskman Student code: 113089 IAPB Advisor: Roger Kerse Tallinn 2015 Author’s declaration I declare that this thesis is the result of my own research except as cited in the references. The thesis has not been accepted for any degree and is not concurrently submitted in candidature of any other degree. 25.05.2015 Roman Kaskman (date) (signature) Abstract The purpose of this thesis is to create a mobile application for expense tracking, with the main focus on functionality allowing to take pictures of receipts issued by Estonian enterprises, extract basic expense information from the captured receipt images and store extracted expenses information in authenticated user’s expense list. The main problems covered in this work are finding the best architectural and design solutions for the application from the perspective of performance, usability, security and further development as well as researching and implementing techniques to handle expense recognition from receipts in an efficient way. As a result of the thesis, a working implementation of expense tracking mobile application for Android appears. After functionality of expenses information extraction from receipt images passes the testing phase, conclusion regarding its reliability is made. Moreover, proposals for further improvements of the application’s functionality are also presented. The thesis is in English and contains 53 pages of text, 6 chapters and 14 figures. Annotatsioon Käesoleva bakalaureusetöö eesmärk on luua mobiilirakendus kasutaja kulude üle arvestuse pidamiseks ja dokumenteerimiseks. -
Character Recognition in Natural Images Utilising Tensorflow
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2017 Character Recognition in Natural Images Utilising TensorFlow ALEXANDER VIKLUND EMMA NIMSTAD KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Character Recognition in Natural Images Utilising TensorFlow ALEXANDER VIKLUND EMMA NIMSTAD Degree project in Computer Science, DD143X Date: June 12, 2017 Supervisor: Kevin Smith Examiner: Örjan Ekeberg Swedish title: Teckenigenkänning i naturliga bilder med TensorFlow School of Computer Science and Communication Abstract Convolutional Neural Networks (CNNs) are commonly used for character recogni- tion. They achieve the lowest error rates for popular datasets such as SVHN and MNIST. Usage of CNN is lacking in research about character classification in nat- ural images regarding the whole English alphabet. This thesis conducts an experi- ment where TensorFlow is used to construct a CNN that is trained and tested on the Chars74K dataset, with 15 images per class for training and 15 images per class for testing. This is done with the aim of achieving a higher accuracy than the non-CNN approach by de Campos et al. [1], that achieved 55:26%. The thesis explores data augmentation techniques for expanding the small training set and evaluates the result of applying rotation, stretching, translation and noise- adding. The result of this is that all of these methods apart from adding noise gives a positive effect on the accuracy of the network. Furthermore, the experiment shows that with a three layered convolutional neural network it is possible to create a character classifier that is as good as de Campos et al.’s. -
Mlsm: Making Authenticated Storage Faster in Ethereum
mLSM: Making Authenticated Storage Faster in Ethereum Pandian Raju1 Soujanya Ponnapalli1 Evan Kaminsky1 Gilad Oved1 Zachary Keener1 Vijay Chidambaram1;2 Ittai Abraham2 1University of Texas at Austin 2VMware Research Abstract the LevelDB [15] key-value store. We show that reading a single key (e.g., the amount of ether in a given account) Ethereum provides authenticated storage: each read can result in 64 LevelDB reads, while writing a single returns a value and a proof that allows the client to verify key can lead to a similar number of LevelDB writes. In- the value returned is correct. We experimentally show ternally, LevelDB induces extra write amplification [23], that such authentication leads to high read and write am- further increasing overall amplification. Such write and × plification (64 in the worst case). We present a novel read amplification reduces throughput (storage band- data structure, Merkelized LSM (mLSM), that signifi- width is wasted by the amplification), and write ampli- cantly reduces the read and write amplification while still fication in particular significantly reduces the lifetime of allowing client verification of reads. mLSM significantly devices such as Solid State Drives (SSDs) which wear increases the performance of the storage subsystem in out after a limited number of write cycles [1, 16, 20]. Ethereum, thereby increasing the performance of a wide Thus, reducing the read and write amplification can both range of Ethereum applications. increase Ethereum throughput and reduce hardware re- 1 Introduction placement costs. Modern crypto-currencies such as Bitcoin [21] and We trace the read and write amplification in Ethereum Ethereum [26] seek to provide a decentralized, to the fact that it provides authenticated storage. -
Wen-Chieh Wu Technical LEAD · SOFTWARE Architect · FULL STACK Engineer
Wen-Chieh Wu TECHNiCAL LEAD · SOFTWARE ARCHiTECT · FULL STACK ENGiNEER [email protected] | jeromewu.github.io | jeromewu | wenchiehwu | jeromewus Summary Passionate problem solver, technology enthusiast, compassionate leader with 7+ years as software engineer and architect and 5+ years as technical lead in multidisciplinary environments. Excel in leadership, critical thinking, communication and software engineering & architecture . Strong skills in JavaScript (incl. React, Express), Golang, Docker, CI/CD pipelines and software architecture design. Major contributor of tesseract.js (24.8k+ stars) and ffmpeg.wasm (6.1k+ stars). Certified architecting, data engineering, big data and machine learning specializations in Google Cloud Platform. Work Experience ByteDance Singapore STAFF SOFTWARE ENGiNEER Aug. 2021 ‑ PRESENT INTELLLEX Singapore TECHNiCAL LEAD Apr. 2020 ‑ Jul. 2021 • Led a team of backend and frontend engineers to develop knowledge platform for law. • Led re‑architecture project which eliminates 10+ redundant services, boosts observability and flexibility of data analysis pipeline and reduce cost • Reduced 30% of AWS cost in 2 months • Succeed in supporting to pass company ISO 27001 annual audit without any NC. Delta Electronics Taipei City, Taiwan SOFTWARE ARCHiTECT & PRODUCT MANAGER May 2019 ‑ Mar. 2020 • Established a cross‑region project team with 10+ members to research and develop an edge computing solution in manufacturing industry. • Designed a microservice software architecture for edge computing solution to enable flexiblity and scalibility. • Adopted gitlab‑CI and Ansible to deploy microservices in Kubernetes cluster to minize deployment efforts. • Developed a tensorflow serving like interface library in Golang. SENiOR TECHNiCAL LEAD Feb. 2018 ‑ May 2019 (1 year 4 months) • Established a corss‑region team of engineers, designers and QA with 20+ members in charge of front‑end development.