Massive Data Processing (The Google Way …)

Total Page:16

File Type:pdf, Size:1020Kb

Massive Data Processing (The Google Way …) 27/04/2015 Fundamentals of Distributed Systems CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2015 Lecture 4: DFS & MapReduce I Aidan Hogan [email protected] Inside Google circa 1997/98 MASSIVE DATA PROCESSING (THE GOOGLE WAY …) Inside Google circa 2015 Google’s Cooling System 1 27/04/2015 Google’s Recycling Initiative Google Architecture (ca. 1998) Information Retrieval • Crawling • Inverted indexing • Word-counts • Link-counts • greps/sorts • PageRank • Updates … Google Engineering Google Engineering • Google File System • Massive amounts of data – Store data across multiple machines • Each task needs communication protocols – Transparent Distributed File System – • Each task needs fault tolerance Replication / Self-healing • MapReduce • Multiple tasks running concurrently – Programming abstraction for distributed tasks – Handles fault tolerance Ad hoc solutions would repeat the same code – Supports many “simple” distributed tasks! • BigTable, Pregel, Percolator, Dremel … Google Re-Engineering Google File System (GFS) MapReduce GOOGLE FILE SYSTEM (GFS) BigTable 2 27/04/2015 What is a File-System? What is a Distributed File-System • Breaks files into chunks (or clusters) • Same thing, but distributed • Remembers the sequence of clusters What would transparency / flexibility / reliability / • Records directory/file structure performance / scalability mean for a distributed file system? • • Tracks file meta-data Transparency: Like a normal file-system • – File size Flexibility: Can mount new machines – Last access • Reliability: Has replication – Permissions • Performance: Fast storage/retrieval – Locks • Scalability: Can store a lot of data / support a lot of machines Google File System (GFS) GFS: Pipelined Writes File System (In-Memory) • 64MB per chunk • Files are huge /blue.txt [3 chunks] • 64 bit label for each chunk 1: {A1, C1, E1} • Assume replication factor of 3 2: {A1, B1, D1} 3: {B1, D1, E1} Master blue.txt /orange.txt • Files often read or appended [2 chunks] (150 MB: 3 chunks) 1: {B1, D1, E1} orange.txt – Writes in the middle of a file not (really) supported 2: {A1, C1, E1} (100MB: 2 chunks) • Concurrency important A1 B1 C1 D1 E1 • Failures are frequent 1 3 3 • Streaming important 2 1 1 1 2 1 2 2 3 1 2 2 Chunk-servers (slaves) GFS: Pipelined Writes (In Words) GFS: Fault Tolerance File System (In-Memory) • 64MB per chunk /blue.txt [3 chunks] • 64 bit label for each chunk 1. Client asks Master to write a file 1: {A1, B1, E1} • Assume replication factor of 3 2: {A1, B1, D1} 3: {B1, D1, E1} Master blue.txt 2. Master returns a primary chunkserver and /orange.txt [2 chunks] (150 MB: 3 chunks) 1: {B1, D1, E1} secondary chunkservers orange.txt 2: {A1, D1, E1} 3. Client writes to primary chunkserver and tells it (100MB: 2 chunks) the secondary chunkservers 4. Primary chunkserver passes data onto A1 B1C1 D1 E1 secondary chunkserver, which passes on … 5. When finished, message comes back through 1 3 2 3 2 1 1 1 1 the pipeline that all chunkservers have written 2 1 2 2 3 1 – Otherwise client sends again 2 2 Chunk-servers (slaves) 3 27/04/2015 GFS: Fault Tolerance (In Words) GFS: Direct Reads I’m looking for File System (In-Memory) /blue.txt • Master sends regular “Heartbeat” pings /blue.txt [3 chunks] 1: {A1, C1, E1} 2: {A1, B1, D1} 1 2 3 3: {B1, D1, E1} Master /orange.txt [2 chunks] • If a chunkserver doesn’t respond 1: {B1, D1, E1} 1. Master finds out what chunks it had 2: {A1, C1, E1} 2. Master assigns new chunkserver for each chunk Client 3. Master tells new chunkserver to copy from a specific existing chunkserver A1 B1 C1 D1 E1 • Chunks are prioritised by number of remaining 1 3 3 replicas, then by demand 2 1 1 1 2 1 2 2 3 1 2 2 Chunk-servers (slaves) GFS: Direct Reads (In Words) GFS: Modification Consistency Masters assign leases to one replica: a “primary replica” 1. Client asks Master for file Client wants to change a file: 2. Master returns location of a chunk 1. Client asks Master for the replicas (incl. primary) – Returns a ranked list of replicas 2. Master returns replica info to the client 3. Client reads chunk directly from chunkserver 3. Client sends change data 4. Client asks primary to 4. Client asks Master for next chunk execute the changes 5. Primary asks secondaries to change Software makes transparent for client! 6. Secondaries acknowledge to primary Remember: Concurrency! 7. Primary acknowledges to client Data & Control Decoupled GFS: Rack Awareness GFS: Rack Awareness Core Core Switch Switch Rack A Rack B Rack C Switch Switch Switch 4 27/04/2015 GFS: Rack Awareness GFS: Rack Awareness (In Words) • Make sure replicas not on same rack Core – In case rack switch fails! Switch • But communication can be slower: Files: – Within rack: pass one switch (rack switch) /orange.txt Rack A Rack B – Across racks: pass three switches (two racks and a core) Switch Switch 1: {A1, A4, B3} 2: {A5, B1, B5} 1 A1 2 B1 Racks: • (Typically) pick two racks with low traffic A: {A1, A2, A3, A4, A5} – Two nodes in same rack, one node in another rack A2 B2 B: {B1, B2, B3, B4, B5} • (Assuming 3x replication) A3 1 B3 1 A4 B4 • Only necessary if more than one rack! 2 A5 2 B5 GFS: Other Operations GFS: Weaknesses? Rebalancing: Spread storage out evenly What do you see as the core weaknesses of the Google File System? Deletion: • Master node single point of failure • Just rename the file with hidden file name – Use hardware replication – To recover, rename back to original version – Logs and checkpoints! – Otherwise, three days later will be wiped • Master node is a bottleneck – Use more powerful machine Monitoring Stale Replicas: Dead slave reappears – Minimise master node traffic with old data: master keeps version info and will • Master-node metadata kept in memory recycle old chunks – Each chunk needs 64 bytes – Chunk data can be queried from each slave GFS: White-Paper HADOOP DISTRIBUTED FILE SYSTEM (HDFS) 5 27/04/2015 Google Re-Engineering HDFS • HDFS-to-GFS Google File System (GFS) – Data-node = Chunkserver/Slave – Name-node = Master • HDFS does not support modifications • Otherwise pretty much the same except … – GFS is proprietary (hidden in Google) – HDFS is open source (Apache!) HDFS Interfaces HDFS Interfaces Google Re-Engineering Google File System (GFS) MapReduce GOOGLE’S MAP-REDUCE 6 27/04/2015 Better partitioning MapReduce in Google MapReduce: Word Count method? • Divide & Conquer: A1 B1 C1 Input Distr. File Sys. 1 3 2 Map() 1. Word count A1 B1 C1 How could we do a distributed top-k word count? (Partition/Shuffle) (Distr. Sort) a-k l-q r-z 2. Total searches per user Reduce() 3. PageRank a 10,023 lab 8,123 rag 543 aa 6,234 label 983 rat 1,233 4. Inverted-indexing … … … Output MapReduce (in more detail) MapReduce (in more detail) 1. Input: Read from the cluster (e.g., a DFS) 3. Partition: Assign sets of keymap values to reducer – Chunks raw data for mappers machines How might Partition work in the word-count case? – Maps raw data to initial (keyin, valuein) pairs What might Input do in the word-count case? 4. Shuffle: Data are moved from mappers to reducers (e.g., using DFS) 2. Map: For each (keyin, valuein) pair, generate zero-to-many (key , value ) pairs map map 5. Comparison/Sort: Each reducer sorts the data – / can be diff. type to / keyin valuein keymap valuemap by key using a comparison function What might Map do in the word-count case? – Sort is taken care of by the framework MapReduce MapReduce: Word Count PseudoCode 6. Reduce: Takes a bag of (keymap, valuemap) pairs with the same keymap value, and produces zero-to-many outputs for each bag – Typically zero-or-one outputs How might Reduce work in the word-count case? 7. Output: Merge-sorts the results from the reducers / writes to stable storage 7 27/04/2015 MapReduce: Scholar Example MapReduce: Scholar Example Assume that in Google Scholar we have inputs like: paperA1 citedBy paperB1 How can we use MapReduce to count the total incoming citations per paper? MapReduce as a Dist. Sys. MapReduce: Benefits for Programmers • Transparency: Abstracts physical machines • Takes care of low-level implementation: • Flexibility: Can mount new machines; can run – Easy to handle inputs and output a variety of types of jobs – No need to handle network communication • Reliability: Tasks are monitored by a master – No need to write sorts or joins node using a heart-beat; dead jobs restart • Abstracts machines (transparency) • Performance: Depends on the application – Fault tolerance (through heart-beats) code but exploits parallelism! – Abstracts physical locations • Scalability: Depends on the application code but can serve as the basis for massive data – Add / remove machines processing! – Load balancing MapReduce: Benefits for Programmers Time for more important things … HADOOP OVERVIEW 8 27/04/2015 Hadoop Architecture HDFS: Traditional / SPOF NameNode 1. NameNode appends Client copy dfs/blue.txt rm dfs/orange.txt edits to log file rmdir dfs/ 2. SecondaryNameNode NameNode JobTracker mkdir new/ mv new/ dfs/ copies log file and fsimage image, makes SecondaryNameNode checkpoint, copies DataNode 1 JobNode 1 image back 3. NameNode loads DataNode 2 JobNode 2 DataNode 1 image on start-up and makes remaining edits … … … SecondaryNameNode not DataNode n JobNode n DataNode n a backup NameNode What is the secondary name-node? Hadoop: High Availability • Name-node quickly logs all file-system actions 1 2 in a sequential (but messy) way Active NameNode StandbyActive NameNode NameNode JournalManager JournalManager • Secondary name-node keeps the main fsimage fsimage file up-to-date based on logs • When the primary name-node boots back up, JournalNode 1 12 it loads the fsimage file and applies the fs edits fs edits remaining log to it • Hence secondary name-node helps make … 21 boot-ups faster, helps keep file system image up-to-date and takes load away from primary JournalNode n 21 1.
Recommended publications
  • Mapreduce and Beyond
    MapReduce and Beyond Steve Ko 1 Trivia Quiz: What’s Common? Data-intensive compung with MapReduce! 2 What is MapReduce? • A system for processing large amounts of data • Introduced by Google in 2004 • Inspired by map & reduce in Lisp • OpenSource implementaMon: Hadoop by Yahoo! • Used by many, many companies – A9.com, AOL, Facebook, The New York Times, Last.fm, Baidu.com, Joost, Veoh, etc. 3 Background: Map & Reduce in Lisp • Sum of squares of a list (in Lisp) • (map square ‘(1 2 3 4)) – Output: (1 4 9 16) [processes each record individually] 1 2 3 4 f f f f 1 4 9 16 4 Background: Map & Reduce in Lisp • Sum of squares of a list (in Lisp) • (reduce + ‘(1 4 9 16)) – (+ 16 (+ 9 (+ 4 1) ) ) – Output: 30 [processes set of all records in a batch] 4 9 16 f f f returned iniMal 1 5 14 30 5 Background: Map & Reduce in Lisp • Map – processes each record individually • Reduce – processes (combines) set of all records in a batch 6 What Google People Have NoMced • Keyword search Map – Find a keyword in each web page individually, and if it is found, return the URL of the web page Reduce – Combine all results (URLs) and return it • Count of the # of occurrences of each word Map – Count the # of occurrences in each web page individually, and return the list of <word, #> Reduce – For each word, sum up (combine) the count • NoMce the similariMes? 7 What Google People Have NoMced • Lots of storage + compute cycles nearby • Opportunity – Files are distributed already! (GFS) – A machine can processes its own web pages (map) CPU CPU CPU CPU CPU CPU CPU CPU
    [Show full text]
  • Google Apps: an Introduction to Docs, Scholar, and Maps
    [Not for Circulation] Google Apps: An Introduction to Docs, Scholar, and Maps This document provides an introduction to using three Google Applications: Google Docs, Google Scholar, and Google Maps. Each application is free to use, some just simply require the creation of a Google Account, also at no charge. Creating a Google Account To create a Google Account, 1. Go to http://www.google.com/. 2. At the top of the screen, select “Gmail”. 3. On the Gmail homepage, click on the right of the screen on the button that is labeled “Create an account”. 4. In order to create an account, you will be asked to fill out information, including choosing a Login name which will serve as your [email protected], as well as a password. After completing all the information, click “I accept. Create my account.” at the bottom of the page. 5. After you successfully fill out all required information, your account will be created. Click on the “Show me my account” button which will direct you to your Gmail homepage. Google Docs Now that you have an account created with Google, you are able to begin working with Google Docs. Google Docs is an application that allows the creation and sharing of documents, spreadsheets, presentations, forms, and drawings online. Users can collaborate with others to make changes to the same document. 1. Type in the address www.docs.google.com, or go to Google’s homepage, http://www. google.com, click the more drop down tab at the top, and then click Documents. Information Technology Services, UIS 1 [Not for Circulation] 2.
    [Show full text]
  • The Google File System (GFS)
    The Google File System (GFS), as described by Ghemwat, Gobioff, and Leung in 2003, provided the architecture for scalable, fault-tolerant data management within the context of Google. These architectural choices, however, resulted in sub-optimal performance in regard to data trustworthiness (security) and simplicity. Additionally, the application- specific nature of GFS limits the general scalability of the system outside of the specific design considerations within the context of Google. SYSTEM DESIGN: The authors enumerate their specific considerations as: (1) commodity components with high expectation of failure, (2) a system optimized to handle relatively large files, particularly multi-GB files, (3) most writes to the system are concurrent append operations, rather than internally modifying the extant files, and (4) a high rate of sustained bandwidth (Section 1, 2.1). Utilizing these considerations, this paper analyzes the success of GFS’s design in achieving a fault-tolerant, scalable system while also considering the faults of the system with regards to data-trustworthiness and simplicity. Data-trustworthiness: In designing the GFS system, the authors made the conscious decision not prioritize data trustworthiness by allowing applications to access ‘stale’, or not up-to-date, data. In particular, although the system does inform applications of the chunk-server version number, the designers of GFS encouraged user applications to cache chunkserver information, allowing stale data accesses (Section 2.7.2, 4.5). Although possibly troubling
    [Show full text]
  • A Review on GOOGLE File System
    International Journal of Computer Science Trends and Technology (IJCST) – Volume 4 Issue 4, Jul - Aug 2016 RESEARCH ARTICLE OPEN ACCESS A Review on Google File System Richa Pandey [1], S.P Sah [2] Department of Computer Science Graphic Era Hill University Uttarakhand – India ABSTRACT Google is an American multinational technology company specializing in Internet-related services and products. A Google file system help in managing the large amount of data which is spread in various databases. A good Google file system is that which have the capability to handle the fault, the replication of data, make the data efficient, memory storage. The large data which is big data must be in the form that it can be managed easily. Many applications like Gmail, Facebook etc. have the file systems which organize the data in relevant format. In conclusion the paper introduces new approaches in distributed file system like spreading file’s data across storage, single master and appends writes etc. Keywords:- GFS, NFS, AFS, HDFS I. INTRODUCTION III. KEY IDEAS The Google file system is designed in such a way that 3.1. Design and Architecture: GFS cluster consist the data which is spread in database must be saved in of single master and multiple chunk servers used by a arrange manner. multiple clients. Since files to be stored in GFS are The arrangement of data is in the way that it can large, processing and transferring such huge files can reduce the overhead load on the server, the consume a lot of bandwidth. To efficiently utilize availability must be increased, throughput should be bandwidth files are divided into large 64 MB size highly aggregated and much more services to make chunks which are identified by unique 64-bit chunk the available data more accurate and reliable.
    [Show full text]
  • F1 Query: Declarative Querying at Scale
    F1 Query: Declarative Querying at Scale Bart Samwel John Cieslewicz Ben Handy Jason Govig Petros Venetis Chanjun Yang Keith Peters Jeff Shute Daniel Tenedorio Himani Apte Felix Weigel David Wilhite Jiacheng Yang Jun Xu Jiexing Li Zhan Yuan Craig Chasseur Qiang Zeng Ian Rae Anurag Biyani Andrew Harn Yang Xia Andrey Gubichev Amr El-Helw Orri Erling Zhepeng Yan Mohan Yang Yiqun Wei Thanh Do Colin Zheng Goetz Graefe Somayeh Sardashti Ahmed M. Aly Divy Agrawal Ashish Gupta Shiv Venkataraman Google LLC [email protected] ABSTRACT 1. INTRODUCTION F1 Query is a stand-alone, federated query processing platform The data processing and analysis use cases in large organiza- that executes SQL queries against data stored in different file- tions like Google exhibit diverse requirements in data sizes, la- based formats as well as different storage systems at Google (e.g., tency, data sources and sinks, freshness, and the need for custom Bigtable, Spanner, Google Spreadsheets, etc.). F1 Query elimi- business logic. As a result, many data processing systems focus on nates the need to maintain the traditional distinction between dif- a particular slice of this requirements space, for instance on either ferent types of data processing workloads by simultaneously sup- transactional-style queries, medium-sized OLAP queries, or huge porting: (i) OLTP-style point queries that affect only a few records; Extract-Transform-Load (ETL) pipelines. Some systems are highly (ii) low-latency OLAP querying of large amounts of data; and (iii) extensible, while others are not. Some systems function mostly as a large ETL pipelines. F1 Query has also significantly reduced the closed silo, while others can easily pull in data from other sources.
    [Show full text]
  • A Study of Cryptographic File Systems in Userspace
    Turkish Journal of Computer and Mathematics Education Vol.12 No.10 (2021), 4507-4513 Research Article A study of cryptographic file systems in userspace a b c d e f Sahil Naphade , Ajinkya Kulkarni Yash Kulkarni , Yash Patil , Kaushik Lathiya , Sachin Pande a Department of Information Technology PICT, Pune, India [email protected] b Department of Information Technology PICT, Pune, India [email protected] c Department of Information Technology PICT, Pune, India [email protected] d Department of Information Technology PICT, Pune, India [email protected] e Veritas Technologies Pune, India, [email protected] f Department of Information Technology PICT, Pune, India [email protected] Article History: Received: 10 January 2021; Revised: 12 February 2021; Accepted: 27 March 2021; Published online: 28 April 2021 Abstract: With the advancements in technology and digitization, the data storage needs are expanding; along with the data breaches which can expose sensitive data to the world. Thus, the security of the stored data is extremely important. Conventionally, there are two methods of storage of the data, the first being hiding the data and the second being encryption of the data. However, finding out hidden data is simple, and thus, is very unreliable. The second method, which is encryption, allows for accessing the data by only the person who encrypted the data using his passkey, thus allowing for higher security. Typically, a file system is implemented in the kernel of the operating systems. However, with an increase in the complexity of the traditional file systems like ext3 and ext4, the ones that are based in the userspace of the OS are now allowing for additional features on top of them, such as encryption-decryption and compression.
    [Show full text]
  • The Informal Sector and Economic Growth of South Africa and Nigeria: a Comparative Systematic Review
    Journal of Open Innovation: Technology, Market, and Complexity Review The Informal Sector and Economic Growth of South Africa and Nigeria: A Comparative Systematic Review Ernest Etim and Olawande Daramola * Department of Information Technology, Cape Peninsula University of Technology, P.O. Box 652, South Africa; [email protected] * Correspondence: [email protected] Received: 17 August 2020; Accepted: 10 October 2020; Published: 6 November 2020 Abstract: The informal sector is an integral part of several sub-Saharan African (SSA) countries and plays a key role in the economic growth of these countries. This article used a comparative systematic review to explore the factors that act as drivers to informality in South Africa (SA) and Nigeria, the challenges that impede the growth dynamics of the informal sector, the dominant subsectors, and policy initiatives targeting informal sector providers. A systematic search of Google Scholar, Scopus, ResearchGate was performed together with secondary data collated from grey literature. Using Boolean string search protocols facilitated the elucidation of research questions (RQs) raised in this study. An inclusion and exclusion criteria became necessary for rigour, comprehensiveness and limitation of publication bias. The data collated from thirty-one (31) primary studies (17 for SA and 14 for Nigeria) revealed that unemployment, income disparity among citizens, excessive tax burdens, excessive bureaucratic hurdles from government, inflationary tendencies, poor corruption control, GDP per capita, and lack of social protection survival tendencies all act as drivers to the informal sector in SA and Nigeria. Several challenges are given for both economies and policy incentives that might help sustain and improve the informal sector in these two countries.
    [Show full text]
  • Cloud Computing Bible Is a Wide-Ranging and Complete Reference
    A thorough, down-to-earth look Barrie Sosinsky Cloud Computing Barrie Sosinsky is a veteran computer book writer at cloud computing specializing in network systems, databases, design, development, The chance to lower IT costs makes cloud computing a and testing. Among his 35 technical books have been Wiley’s Networking hot topic, and it’s getting hotter all the time. If you want Bible and many others on operating a terra firma take on everything you should know about systems, Web topics, storage, and the cloud, this book is it. Starting with a clear definition of application software. He has written nearly 500 articles for computer what cloud computing is, why it is, and its pros and cons, magazines and Web sites. Cloud Cloud Computing Bible is a wide-ranging and complete reference. You’ll get thoroughly up to speed on cloud platforms, infrastructure, services and applications, security, and much more. Computing • Learn what cloud computing is and what it is not • Assess the value of cloud computing, including licensing models, ROI, and more • Understand abstraction, partitioning, virtualization, capacity planning, and various programming solutions • See how to use Google®, Amazon®, and Microsoft® Web services effectively ® ™ • Explore cloud communication methods — IM, Twitter , Google Buzz , Explore the cloud with Facebook®, and others • Discover how cloud services are changing mobile phones — and vice versa this complete guide Understand all platforms and technologies www.wiley.com/compbooks Shelving Category: Use Google, Amazon, or
    [Show full text]
  • Integrating Crowdsourcing with Mapreduce for AI-Hard Problems ∗
    Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence CrowdMR: Integrating Crowdsourcing with MapReduce for AI-Hard Problems ∗ Jun Chen, Chaokun Wang, and Yiyuan Bai School of Software, Tsinghua University, Beijing 100084, P.R. China [email protected], [email protected], [email protected] Abstract In this paper, we propose CrowdMR — an extended MapReduce model based on crowdsourcing to handle AI- Large-scale distributed computing has made available hard problems effectively. Different from pure crowdsourc- the resources necessary to solve “AI-hard” problems. As a result, it becomes feasible to automate the process- ing solutions, CrowdMR introduces confidence into MapRe- ing of such problems, but accuracy is not very high due duce. For a common AI-hard problem like classification to the conceptual difficulty of these problems. In this and recognition, any instance of that problem will usually paper, we integrated crowdsourcing with MapReduce be assigned a confidence value by machine learning algo- to provide a scalable innovative human-machine solu- rithms. CrowdMR only distributes the low-confidence in- tion to AI-hard problems, which is called CrowdMR. In stances which are the troublesome cases to human as HITs CrowdMR, the majority of problem instances are auto- via crowdsourcing. This strategy dramatically reduces the matically processed by machine while the troublesome number of HITs that need to be answered by human. instances are redirected to human via crowdsourcing. To showcase the usability of CrowdMR, we introduce an The results returned from crowdsourcing are validated example of gender classification using CrowdMR. For the in the form of CAPTCHA (Completely Automated Pub- lic Turing test to Tell Computers and Humans Apart) instances whose confidence values are lower than a given before adding to the output.
    [Show full text]
  • 4 Google Scholar-Books
    Scholarship Tools: Google Scholar Eccles Health Sciences Library NANOS 2010 What is Google Scholar? Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research. Features of Google Scholar • Search diverse sources from one convenient place • Find articles, theses, books, abstracts or court opinions • Locate the complete document through your library or on the web • Cited by links you to abstracts of papers where the article has been cited • Related Articles links you to articles on similar topics Searching Google Scholar You can do a simple keyword search as is done in Google, or you can refine your search using the Advanced Search option. Advanced Search allows you t o limit by author, journal, date and collections. Results are ranked by weighing the full text of the article, where it is published, who it is written by and how often it has been n cited in other scholarly journals. Google Scholar Library Links Find an interesting abstract or citation that you wish you could read? In many cases you may have access to the complet e document through your library. You can set your preferences so that Google Scholar knows your primary library. Once this is set, it will determine whether your library can supply the article in Google Scholar preferences. You may need to be on campus or logged into a VPN or a Proxy server to get the full text of the article, even if it is in your library.
    [Show full text]
  • Social Factors Associated with Chronic Non-Communicable
    BMJ Open: first published as 10.1136/bmjopen-2019-035590 on 28 June 2020. Downloaded from PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf) and are provided with free text boxes to elaborate on their assessment. These free text comments are reproduced below. ARTICLE DETAILS TITLE (PROVISIONAL) Social factors associated with chronic non-communicable disease and comorbidity with mental health problems in India: a scoping review AUTHORS M D, Saju; Benny, Anuja; Scaria, Lorane; Anjana, Nannatt; Fendt- Newlin, Meredith; Joubert, Jacques; Joubert, Lynette; Webber, Martin VERSION 1 – REVIEW REVIEWER Hoan Linh Banh University of Alberta, Canada REVIEW RETURNED 28-Nov-2019 GENERAL COMMENTS There is a major flaw with the search strategy. The authors did not include important databases such as PsycInfo and the Educational Resources Information Centre. Also, pubmed should have been used rather than medline. Finally, the authors did not even attempt to search google scholar for grey literature. The the study only included 10 papers and 6 were on multiple countries which was one of the exclusion critera. http://bmjopen.bmj.com/ REVIEWER Graham Thornicroft KCL, UK REVIEW RETURNED 06-Dec-2019 GENERAL COMMENTS MJ Open: Social factors associated with chronic non-communicable disease and comorbidity with mental health problems in India: a scoping review on October 2, 2021 by guest. Protected copyright. The aim of this paper is to examine the existing literature of the major social risk factors which are associated with diabetes, hypertension and the co- morbid conditions of depression and anxiety in India.
    [Show full text]
  • Mapreduce: Simplified Data Processing On
    MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat [email protected], [email protected] Google, Inc. Abstract given day, etc. Most such computations are conceptu- ally straightforward. However, the input data is usually MapReduce is a programming model and an associ- large and the computations have to be distributed across ated implementation for processing and generating large hundreds or thousands of machines in order to finish in data sets. Users specify a map function that processes a a reasonable amount of time. The issues of how to par- key/value pair to generate a set of intermediate key/value allelize the computation, distribute the data, and handle pairs, and a reduce function that merges all intermediate failures conspire to obscure the original simple compu- values associated with the same intermediate key. Many tation with large amounts of complex code to deal with real world tasks are expressible in this model, as shown these issues. in the paper. As a reaction to this complexity, we designed a new Programs written in this functional style are automati- abstraction that allows us to express the simple computa- cally parallelized and executed on a large cluster of com- tions we were trying to perform but hides the messy de- modity machines. The run-time system takes care of the tails of parallelization, fault-tolerance, data distribution details of partitioning the input data, scheduling the pro- and load balancing in a library. Our abstraction is in- gram's execution across a set of machines, handling ma- spired by the map and reduce primitives present in Lisp chine failures, and managing the required inter-machine and many other functional languages.
    [Show full text]