(12) United States Patent (10) Patent No.: US 9,697,034 B2 Chadha Et Al
Total Page:16
File Type:pdf, Size:1020Kb
US009697034B2 (12) United States Patent (10) Patent No.: US 9,697,034 B2 Chadha et al. (45) Date of Patent: *Jul. 4, 2017 (54) OFFLOADING PROBABILISTIC (56) References Cited COMPUTATIONS IN DATA ANALYTICS U.S. PATENT DOCUMENTS APPLICATIONS M 7,003,507 B2 * 2/2006 Tip .......................... G06F 9/443 (71) Applicant: Futurewei Technologies, Inc., Plano, 7,318,229 B1* 1/2008 Connor ................... G06F 9/443 TX (US) 71.4/25 7,475,199 B1* 1/2009 Bobbitt ............. GO6F 17,30079 (72) Inventors: Vineet Chadha, San Jose, CA (US); 707,999.2O2 Gopinath Palani, Sunnyvale, CA (US); (Continued) Guangyu Shi, Cupertino, CA (US) FOREIGN PATENT DOCUMENTS (73) Assignee: Futurewei Technologies, Inc., Plano, CN 102109.997 6, 2011 TX (US) CN 103595.720 2, 2014 WO WO-201411.0137 T 2014 (*) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 OTHER PUBLICATIONS U.S.C. 154(b) by 0 days. “International Application No. PCT/CN2016/09 1776, International claimer.This patent is subject to a terminal dis SNARED.o.o.o. 4,200. Primary Examiner — Gregory A Kessler (21) Appl. No.: 14/821,320 (74) Attorney, Agent, or Firm — Schwegman Lundberg & (22) Filed: Aug. 7, 2015 Woessner, P.A. (65) Prior Publication Data (57) ABSTRACT An approach to offloading probabilistic computations is US 2017/OO3908O A1 Feb. 9, 2017 described. An application server comprising a memory and a processor and coupled to a network-attached storage (51) Int. Cl. device configured to create a dedicated process in response G06F 9/455 (2006.01) to a procedural call to a virtual machine container based on G06F 7/30 (2006.01) a data request is disclosed. The processor forwards the data H04L 29/08 (2006.01) request to the network-attached storage device, programs (52) U.S. Cl. one or more virtual machines to perform a probabilistic CPC. G06F 9/45558 (2013.01); G06F 17/30203 computation based on the procedural call, and directs the (2013.01); G06F 17/30233 (2013.01); H04L probabilistic computation to a first virtual machine of the 67/1097 (2013.01); G06F 2009/45583 one or more virtual machines. The request for data is (2013.01); G06F 2009/45595 (2013.01) transformed into a modified call using a virtualized lookup (58) Field of Classification Search call. None See application file for complete search history. 20 Claims, 4 Drawing Sheets Load the communication module as a 400 layer of indirection to pass the - semantics based procedural call to a 401 process level wirtual machine Load a modified NFS server into the Ofs to establish an IPC - communication channel userspace 402 container invoke a user level application program to register the PD with the - kerelmodule and fork a WMor- 403 demand Boot-strap the modified NFS server Y. mountd and rfsd) 404 Starta client application and transform the request for data into a N modified call through a wirtualized 405 lookup call Forward the call to the user level application wia the virtualized NFS -N procedural call, which will fork the 406 WM for data operation offload into the secured container Return results to the user application N 407 US 9,697,034 B2 Page 2 (56) References Cited U.S. PATENT DOCUMENTS 8,775,358 B2* 7/2014 Bonawitz ............... GO6N 7,005 7O6/52 8,782,161 B2 7/2014 Sugumar et al. 2011 O161495 A1 6/2011 Ratering et al. 2013/019 1827 A1* 7, 2013 Ashok ................... GO6F9,5016 T18, 1 2014/O181804 A1 6/2014 Sakata et al. * cited by examiner U.S. Patent Jul. 4, 2017 Sheet 1 of 4 US 9,697,034 B2 1OO \ to- ------------------- - NAS Server 109 l al FIG. 1 U.S. Patent Jul. 4, 2017 Sheet 2 of 4 US 9,697,034 B2 Application Assass).Šy sa YSSY sw is \xy Sev's& ----- 208 NASS systs 209 F.G. 2 U.S. Patent Jul. 4, 2017 Sheet 3 of 4 US 9,697,034 B2 305 FG. 3 U.S. Patent Jul. 4, 2017 Sheet 4 of 4 US 9,697,034 B2 Load the communication module as a 400 layer of indirection to pass the semantics based procedural call to a 401 process level virtual machine Load a modified NFS Server into the O/S to establish an IPC communication channel user space 402 Container Invoke a user level application program to register the PID with the kernel module and fork a VM on 403 demand Boot-strap the modified NFS server (mountd and nfsd) 404 Start a client application and transform the request for data into a modified call through a virtualized 405 lookup call Forward the call to the user level application via the virtualized NFS procedural call, which will fork the 406 VM for data operation offload into the Secured Container Return results to the user application FIG. 4 US 9,697,034 B2 1. 2 OFFLOADING PROBABILISTIC request to the network-attached storage device, programs COMPUTATIONS IN DATA ANALYTICS one or more virtual machines to perform a probabilistic APPLICATIONS computation based on the procedural call, and directs the probabilistic computation to a first virtual machine of the FIELD one or more virtual machines. The request for data is transformed into a modified call using a virtualized lookup Embodiments of the present invention generally relate to call. the field of analytic computations. More specifically, In another embodiment, a method for offloading a com embodiments of the present invention relate to distributed putation to a storage device is disclosed. The method probabilistic computations. 10 includes registering a process identifier with a processor using a user-level application to establish a channel of BACKGROUND communication with the processor and fork an output of a virtual machine to create a new process, transforming a Many data analytics applications (e.g., recommendation request for data into a modified call using a virtualized engines) require probabilistic computations through special 15 lookup call, forwarding the modified call to the user-level ized probabilistic data structures (e.g., bloom filtering, linear application using a virtualized NFS procedural call and the counting, Log Log algorithms, element cardinality, count channel of communication, creating an inter-process com min algorithms) based on very large amounts of data. This munication channel between a kernel address space of the typically requires fetching large quantities of data from one processor and a user address space, and using the commu or more storage servers and transmitting the data to a nication channel to forward an NFS call to the virtual computational node for processing. Currently there are no machine to perform the probabilistic computation. viable solutions to offload and execute probabilistic based computations on distributed file storage servers (e.g., NFS/ BRIEF DESCRIPTION OF THE DRAWINGS CIFS servers). Some approaches cause a large increase in network usage, NAS-side CPU usage, memory usage, and 25 The accompanying drawings, which are incorporated in NIC usage for transferring raw data to the node for process and form a part of this specification, illustrate embodiments ing. Other drawbacks include data fetching latency and the of the invention and, together with the description, serve to consumption of application server resources during compu explain the principles of the invention: tation. FIG. 1 is a block diagram depicting an exemplary system Very large data sets are common in the web domain or the 30 architecture according to embodiments of the present dis data analytics domain. Many models are used to process closure. large scale data such as MapReduce. An example of such FIG. 2 is a block diagram depicting an exemplary con large scale data processing is the Hadoop ecosystem which figuration for offloading an operation from an application is based on querying large data sets for analytics. For Such server according to embodiments of the present disclosure. large scale data, memory resources are often a limiting 35 FIG. 3 is a block diagram depicting an exemplary proba factor. Various algorithms have been explored which attempt bilistic computation according to embodiments of the pres a compromise between the amount of memory used and the ent disclosure. precision required. Because analytics require an estimate of FIG. 4 is a flowchart depicting an exemplary computer the result of a query, a variance in the expected values is implemented sequence of steps for integrating the VM at the tolerable and safe if based on the computation model used to 40 storage end with the application server, according to some calculate the value. embodiments of the present disclosure. SUMMARY DETAILED DESCRIPTION An approach to offloading probabilistic computations is 45 Reference will now be made in detail to several embodi realized through lightweight (e.g., low overhead) VMs at the ments. While the subject matter will be described in con storage end. This approach is enabled through the extended junction with the alternative embodiments, it will be under distributed protocol by embedding probabilistic operations stood that they are not intended to limit the claimed subject (e.g., semantics), along with pointers to data on the storage matter to these embodiments. On the contrary, the claimed device. A storage controller performs multiple functions, 50 Subject matter is intended to cover alternative, modifica including deduplication, provisioning, replication, and tier tions, and equivalents, which may be included within the ing. Probabilistic algorithms (e.g., classification, clustering spirit and scope of the claimed Subject matter as defined by and collaborative filtering) may be embedded into the tra the appended claims. ditional protocols (NFS & CIFS) with the defined exten Furthermore, in the following detailed description, sions. A translator or parser is used to convert the expression 55 numerous specific details are set forth in order to provide a (operations) into C objects.