Practical Resource Monitoring for Robust High Throughput Computing Gideon Juve1, Benjamin Tovar2, Rafael Ferreira da Silva1, Casey Robinson2 Douglas Thain2, Ewa Deelman1, William Allcock3, Miron Livny4 1University of Southern California, Information Sciences Institute, Marina Del Rey, CA, USA fgideon,rafsilva,
[email protected] 2University of Notre Dame, Notre Dame, IN, USA fdthain,crobins9,
[email protected] 3Argonne National Laboratory
[email protected] 4University of Wisconsin Madison, Madison, WI, USA
[email protected] ABSTRACT ing statistics. Robust high throughput computing requires effective mon- Efficient and robust resource provisioning and scheduling itoring and enforcement of a variety of resources including strategies are required to handle this category of applica- CPU cores, memory, disk, and network traffic. Without ef- tions. Scheduling and provisioning algorithms typically as- fective monitoring and enforcement, it is easy to overload sume that resource usage information such as wall time, file machines, causing failures and slowdowns, or underload ma- size, and memory requirements, are all available in advance chines, which results in wasted opportunities. This paper or can be reliably estimated [2,4,1, 42, 23], but in prac- explores how to describe, measure, and enforce resources tice this information is rarely available. As middleware layer used by computational tasks. We focus on tasks running get information from the user, without detailed resource in- in distributed execution systems, in which a task requests formation, it is virtually impossible to make even a simple the resources it needs, and the execution system ensures the decision such as how many tasks to run simultaneously on a availability of such resources.