Big Data Benchmarking Workshop Publications

Benchmarking Datacenter and Big Data Systems Wanling Gao, Zhen Jia, Lei Wang, Yuqing Zhu, Chunjie Luo, Yingjie Shi, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, Bizhu Qiu, Lixin Zhang, Jianfeng Zhan INSTITUTE OFTECHNOLOGY COMPUTING http://prof.ict.ac.cn/ICTBench 1 Acknowledgements This work is supported by the Chinese 973 project (Grant No.2011CB302502), the Hi- Tech Research and Development (863) Program of China (Grant No.2011AA01A203, No.2013AA01A213), the NSFC project (Grant No.60933003, No.61202075) , the BNSFproject (Grant No.4133081), and Huawei funding. 2/ Big Data Benchmarking Workshop Publications BigDataBench: a Big Data Benchmark Suite from Web Search Engines. Wanling Gao, et al. The Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in conjunction with ISCA 2013. Characterizing Data Analysis Workloads in Data Centers. Zhen Jia, et al. 2013 IEEE International Symposium on Workload Characterization （IISWC-2013) Characterizing OS behavior of Scale-out Data Center Workloads. Chen Zheng et al. Seventh Annual Workshop on the Interaction amongst Virtualization, Operating Systems and Computer Architecture (WIVOSCA 2013). In Conjunction with ISCA 2013.[ Characterization of Real Workloads of Web Search Engines. Huafeng Xi et al. 2011 IEEE International Symposium on Workload Characterization （IISWC-2011). The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems. Zhen Jia et al. Second workshop of big data benchmarking (WBDB 2012 India) & Lecture Note in Computer Science (LNCS) CloudRank-D: Benchmarking and Ranking Cloud Computing Systems for Data Processing Applications. Chunjie Luo et al. Front. Comput. Sci. (FCS) 2012, 6(4): 347–362 3/ Big Data Benchmarking Workshop Content Background and Motivation Our ICTBench Case studies 4/ Big Data Benchmarking Workshop Question One Gap between Industry and Academia Longer and longer distance • Code • Data sets 5/ Big Data Benchmarking Workshop Question Two Different benchmark requirements Architecture communities • Simulation is very slow • Small data and code sets System communities • Large-scale deployment is valuable. Users • There are three kind of lies: lies, damn lies, and benchmarks • Real-world applications 6/ Big Data Benchmarking Workshop Data Centers in the World Emerson December 2011 http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx 7/ Big Data Benchmarking Workshop State-of-Practice Benchmark Suites SPEC CPU SPEC Web HPCC PARSEC TPCC Gridmix YCSB 8/ Big Data Benchmarking Workshop Current Benchmarks Field Benchmark Name CPU SPEC CPU Web server SPEC Web CMP PARSEC OLTP TPC-C OLAP TPC-DS HPC HPCC, Linpack NoSQL YCSB Network httperf … … 9/ Big Data Benchmarking Workshop Why a New Benchmark Suite for Datacenter Computing No benchmark suite covers diversity of data center workloads State-of-art: CloudSuite Only includes 6 applications according to its popularity 10/ Big Data Benchmarking Workshop Why a New Benchmark Suite (Cont’) Memory Level Parallelism(MLP): Simultaneously outstanding cache misses C loudSuite our benchmark suite DCBench MLP 11/ Big Data Benchmarking Workshop Why a New Benchmark Suite (Cont’) Scale-out performance DCBench Cloudsuite Data analysis benchmark 6 5 sort grep wordcount 4 svm kmeans Speed Speed up fkmeans 3 all-pairs Bayes 2 HMM 1 1 4 8 Working nodes 12/ Big Data Benchmarking Workshop Content Background and Motivation Our ICTBench Case studies 13/ Big Data Benchmarking Workshop ICTBench Project Benchmarking Foundation of researches. Bridge ICTBench: three benchmark suites DCBench: architecture (application, OS, and VM execution) BigDataBench: System (large-scale big data application) CloudRank: Cloud benchmarks (distributed management) Project homepage http://prof.ict.ac.cn/ICTBench 14/ Big Data Benchmarking Workshop DCBench DCBench: typical data center workloads Different from scientific computing: FLOPS Cover applications in important domains • Search engine, electronic commence etc. Each benchmark = a single application Purposes Architecture system (small-to-medium) researches 15/ Big Data Benchmarking Workshop BigDataBench Characterizing big data applications Not including data-intensive super computing Synthetic data sets varying from 10G~ PB Each benchmark = a single big application. Purposes large-scale system and architecture researches An incremental approach Release a start-up benchmark suite • Workloads in the search engine system Other important domains 16/ Big Data Benchmarking Workshop CloudRank Cloud computing Elastic resource management Consolidating different workloads Cloud benchmarks Each benchmark = a group of consolidated data center workloads. Three benchmarks: services/ data processing/ desktop Purposes Capacity planning, system evaluation and researches User can customize their benchmarks. 17/ Big Data Benchmarking Workshop Benchmarking Methodology To decide and rank main application domains according to a publicly available metric e.g. page view and daily visitors To single out the main applications from main applications domains 18/ Big Data Benchmarking Workshop Top Sites on the Web Search Engine Social Network Electronic Commerce Media Streaming Others 15% 5% 40% 15% 25% Top Sites on the Web More details in http://www.alexa.com/topsites/global;0 19/ Big Data Benchmarking Workshop Benchmarking Methodology To decide and rank main application domains according to a publicly available metric e.g. page view and daily visitors To single out the main applications from main applications domains 20/ Big Data Benchmarking Workshop Algorithms in Top Sites: Search Engine Search Engine Social Network Algorithms used in Search: Electronic Commerce Media Streaming Pagerank Others Graph mining Segmentation 15% Feature Reduction 5% 40% Grep Statistical counting 15% Vector calculation sort Recommendation 25% …… Top Sites on The Web 21/ Big Data Benchmarking Workshop Our practice Building a sematic search engine (Chinese) ProfSearch • Search scientists or professionals • 267083 researchers across 260 universities and institutes • http://prof.ict.ac.cn/ 22/ Big Data Benchmarking Workshop ProfSearch Crawler Workloads • Scrapy Analysis Workloads • SVM, Naïve Bayes, K-means, HMM, CRFs, LSA, LDA Store and Management Workloads • HDFS – Storing unstructured web pages • HIVE – Storing semi-structured intermediate data • MySQL – Storing structured data extracted from the web Web Service Workloads • Sphinx 23/ Big Data Benchmarking Workshop Algorithms in Top Sites: Social Network Search Engine Social Network Electronic Commerce Media Streaming Algorithms used in Social Network: Others Recommendation Clustering 15% Classification Graph mining 5% 40% Grep 15% Feature Reduction Statistical counting Vector calculation 25% Sort …… Top Sites on The Web 24/ Big Data Benchmarking Workshop Algorithms in Top Sites: Electronic Commerce Search Engine Social Network Electronic Commerce Media Streaming Algorithms used in electronic Others commerce: Recommendation 15% Associate rule mining Warehouse operation 5% 40% Clustering 15% Classification Statistical counting Vector calculation 25% …… Top Sites on The Web 25/ Big Data Benchmarking Workshop Main Algorithms in Data Centers Segmentation Basic operation Warehouse operation Classification Data center Cluster Feature reduction algorithms Recommendation Vector calculate Association rule mining Graph mining 26/ Big Data Benchmarking Workshop Where Do Those Algorithms Exactly Used in Data Centers ? Here, lets’ investigate mostly used applications in data centers The ubiquitous search engine Frequently used recommendation sub-systems 27/ Big Data Benchmarking Workshop Main Arithmetic in Common Search Engines （Nutch） Sort Word Grep Merge Sort Segmentation Classification BFS Word Count Vector calculate Scoring & Sort DecisionTree Segmentation PageRank 28/ Big Data Benchmarking Workshop Algorithms in Search Engine graph mining grep & segmentation pagerank word count sort vector calculation 29/ Big Data Benchmarking Workshop Representative Algorithms in Search Engine Algorithms Role in the search engine graph mining crawl web page Grep abstracting content from HTML segmentation word segmentation pagerank compute the page rank value Word counting word frequency count vector calculation document matching sort document sorting 30/ Big Data Benchmarking Workshop Algorithms in Recommendation Sub-systems 31/ Big Data Benchmarking Workshop Representative Algorithms in Recommendation Sub-systems Algorithms Role in the recommendation sub-systems Classification classify web pages/user behavior Frequent pattern growth user log mining Hidden markov model information extraction Clustering/similarity analysis clustering web pages/user behavior Collaborative filtering recommendation Feature reduction text representation/user behavior representation Graph mining web link analysis 32/ Big Data Benchmarking Workshop Overview of DCBench Category Workloads Programmin language source g model Basic operation Sort MapReduce Java Hadoop Wordcount MapReduce Java Hadoop Grep MapReduce Java Hadoop Classification Naïve Bayes MapReduce Java Mahout Support Vector MapReduce Java Implemented Machine by ourself Cluster K-means MapReduce Java Mahout MPI C++ IBM PML Fuzzy k-means MapReduce Java Mahout MPI C++ IBM PML Recommendatio Item based MapReduce Java Mahout n Collaborative Filtering Association rule Frequent pattern MapReduce Java Mahout mining growth Segmentation Hidden

Big Data Benchmarking Workshop Publications

Model Driven Scheduling for Virtualized Workloads

Towards Better Performance Per Watt in Virtual Environments on Asymmetric Single-ISA Multi-Core Systems

An Experimental Evaluation of Datacenter Workloads on Low-Power Embedded Micro Servers

Adaptive Control of Apache Web Server

Energy Efficiency of Server Virtualization

Remote Profiling of Resource Constraints of Web Servers Using

Optimal Power Allocation in Server Farms

Benchmarking Models and Tools for Distributed Web-Server Systems

NCA’04) 0-7695-2242-4/04 $ 20.00 IEEE Interesting Proposals

Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography

Phelps AJ T 2020.Pdf (1.937Mb)

Chapter 1 the CASE for POWER MANAGEMENT in WEB SERVERS