Integrating R with Azure for High-Throughput Analysis Hugh Analysis Shanahan

Integrating R with Azure for High-Throughput Analysis Hugh Analysis Shanahan

Integrating R with Azure for High- throughput Integrating R with Azure for High-throughput analysis Hugh analysis Shanahan Hugh Shanahan Department of Computer Science Royal Holloway, University of London [email protected] @HughShanahan Hugh Shanahan Integrating R with Azure for High-throughput analysis Applicability to other domains Integrating R with Azure for High- throughput analysis This project started out doing something very specific Hugh for the domain I work in (Computational Biology). Shanahan I promise that there will be no Biology in this talk !! Realised can be extended to running high-throughput jobs in R. Contrast with MapReduce / R formalisms (HadoopStreaming, Rhipe, Revolution Analytics, ... ) - parallelisation happens outside of individual R script. Hugh Shanahan Integrating R with Azure for High-throughput analysis Applicability to other domains Integrating R with Azure for High- throughput analysis This project started out doing something very specific Hugh for the domain I work in (Computational Biology). Shanahan I promise that there will be no Biology in this talk !! Realised can be extended to running high-throughput jobs in R. Contrast with MapReduce / R formalisms (HadoopStreaming, Rhipe, Revolution Analytics, ... ) - parallelisation happens outside of individual R script. Hugh Shanahan Integrating R with Azure for High-throughput analysis Applicability to other domains Integrating R with Azure for High- throughput analysis This project started out doing something very specific Hugh for the domain I work in (Computational Biology). Shanahan I promise that there will be no Biology in this talk !! Realised can be extended to running high-throughput jobs in R. Contrast with MapReduce / R formalisms (HadoopStreaming, Rhipe, Revolution Analytics, ... ) - parallelisation happens outside of individual R script. Hugh Shanahan Integrating R with Azure for High-throughput analysis Applicability to other domains Integrating R with Azure for High- throughput analysis This project started out doing something very specific Hugh for the domain I work in (Computational Biology). Shanahan I promise that there will be no Biology in this talk !! Realised can be extended to running high-throughput jobs in R. Contrast with MapReduce / R formalisms (HadoopStreaming, Rhipe, Revolution Analytics, ... ) - parallelisation happens outside of individual R script. Hugh Shanahan Integrating R with Azure for High-throughput analysis IaaS clouds Integrating R with Azure for High- throughput analysis Hugh Shanahan We all now know what clouds are ! Infrastructure as a Service (IaaS) Access Virtual Machine via the command line Amazon, Rackspace, OpenStack ... Hugh Shanahan Integrating R with Azure for High-throughput analysis IaaS clouds Integrating R with Azure for High- throughput analysis Hugh Shanahan Platform as a Service (PaaS) Access Virtual Machine programatically. Explicitly allows for batch control, more complicated workflows etc. Hugh Shanahan Integrating R with Azure for High-throughput analysis Microsoft Azure and Generic Worker Libraries Integrating R with Azure for High- Azure offers both IaaS and PaaS. throughput analysis IaaS VM’s can run a variety of different flavours of Linux Hugh and Windows OS’s Shanahan PaaS (they refer to this as a Cloud Service) only runs Windows Server. Mass Storage (not storage associated with VM). Programatic access is via ASP.NET and C# Access mass storage via a variety of languages. Set of libraries which allow control of jobs running on VM’s. Generic Worker (GW) Hugh Shanahan Integrating R with Azure for High-throughput analysis Scaling up Integrating R Needed to scale up a problem based on six data sets to with Azure for High- nearly six hundred (100 Mbyte ! 1 Tbyte). throughput analysis Calculations based on an R script. Hugh Each data set can be analysed one at a time (batch Shanahan mode). Individual data sets can vary by two orders of magnitude. Mass Storage Raw Data R Script R Script . on VM on VM Log Data Hugh Shanahan Integrating R with Azure for High-throughput analysis Implementation Integrating R with Azure for High- throughput analysis Hugh Made use of Azure PaaS with GW libraries. Shanahan Written using a combination of C# and Java. R executables + library uploaded to mass storage. Data to be analysed placed in separate container of mass storage. R script uploaded at run time. Hugh Shanahan Integrating R with Azure for High-throughput analysis Operation Integrating R with Azure for High- Mass Storage throughput 1 analysis Local Windows PC Hugh App R executable + Shanahan libraries Container 2 .) R script .) List of Id’s .) Additonal data Data Container (pre−loaded) Hugh Shanahan Integrating R with Azure for High-throughput analysis Launching Integrating R with Azure for Cloud Worker Roles 1 + 2 High- throughput Id i analysis VM 1 Mass Storage Hugh Shanahan App Container . 1 + 2 . Id k Data Container VM n Hugh Shanahan Integrating R with Azure for High-throughput analysis Running Integrating R with Azure for Cloud Worker Roles High- throughput analysis VM 1 Data set i Mass Storage Hugh Shanahan App Container . Data set k Data Container VM n Hugh Shanahan Integrating R with Azure for High-throughput analysis Logging it all Integrating R with Azure for Cloud Worker Roles High- throughput Log file i analysis VM 1 Mass Storage Hugh Shanahan App Container . Log file k . Data Container VM n Hugh Shanahan Integrating R with Azure for High-throughput analysis Integrating R with Azure for High- throughput analysis Hugh Shanahan In reality ..... this is less than 100 lines of C# Hugh Shanahan Integrating R with Azure for High-throughput analysis Extending to any R script Integrating R with Azure for High- throughput analysis Hugh This can be extended to any case where Shanahan you have data sets to be analysed by an R script, the data is analysed individually. Set of complex financial instruments Parameter sweeps Hugh Shanahan Integrating R with Azure for High-throughput analysis Extending to any R script Integrating R with Azure for High- throughput analysis Hugh Shanahan This can be extended to any case where you have data sets to be analysed by an R script, the data is analysed individually. Set of complex financial instruments Parameter sweeps Hugh Shanahan Integrating R with Azure for High-throughput analysis Integrating R Key issues to fix this Summer with Azure for High- Getting set up (configuration files and keys). throughput Adding GUI. analysis Hugh https://github.com/hughshanahan/GWydiR Shanahan https://github.com/hughshanahan/ RAzureEssentials Will port over to a more suitable github address for group development this Summer. Hugh Shanahan Integrating R with Azure for High-throughput analysis Conclusions Integrating R with Azure for High- throughput analysis Hugh C# and ASP.NET can be a learning curve for Linux Shanahan users. Nonetheless PaaS explicitly allows control of VM’s. Batch mode implementation for a specific problem. Allows analysis on Tbyte-sized data set Modified to run any R script in batch mode - much more general. Hugh Shanahan Integrating R with Azure for High-throughput analysis Shameless Plug Integrating R with Azure for High- throughput analysis Hugh M.Sc. in Data Science and Analytics Shanahan M.Sc. in Machine Learning M.Sc. in Computational Finance All starting this year at Royal Holloway. Please go to http://bit.ly/1418DOS for further details. Hugh Shanahan Integrating R with Azure for High-throughput analysis Shameless Plug Integrating R with Azure for High- throughput analysis Hugh M.Sc. in Data Science and Analytics Shanahan M.Sc. in Machine Learning M.Sc. in Computational Finance All starting this year at Royal Holloway. Please go to http://bit.ly/1418DOS for further details. Hugh Shanahan Integrating R with Azure for High-throughput analysis Acknowledgments Integrating R with Azure for High- throughput analysis Andrew (Harry) Harrison Hugh Shanahan Anne Owen Funded by Venus-C EU Network Contact [email protected] @hughshanahan Thank you for your time ! Hugh Shanahan Integrating R with Azure for High-throughput analysis.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    22 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us