University of Leicester
Browse

SAS: Speculative Locality Aware Scheduling for I/O intensive scientific analysis in clouds

journal contribution
posted on 2025-01-07, 16:17 authored by Ali Zahir, Ashiq AnjumAshiq Anjum, Satish Narayana Srirama, Rajkumar Buyya
The execution of data intensive analysis workflows in a multi-cloud environment, such as the World Large hadron collider Computing Grid (WLCG) at CERN, requires a large amount of input data, which is stored in multiple storage elements. The turnaround time taken by an individual analysis workflow running on an edge machine is mostly affected by the data reading time. Minimizing the data reading time can improve the overall efficiency of the data analysis process. To overcome this problem, we have used Speculative Scheduling to optimize the multi-cloud analysis workflows by intelligently streaming data before a task arrives for execution at the edge machine. We propose an Event System (ES) which is an in-memory Serverless process responsible for proactively providing input data to the workflow processes. It prefetches the data from the storage elements to the memory of the edge machine, which executes the workflow. Using locality aware scheduling and prefetching algorithms, it performs Speculative Scheduling on the basis of the evaluation of historic execution logs using the Bayesian Inference model. The Serverless ES learns about the incoming jobs ahead of time and makes use of intelligent data streaming to supply data to these jobs, thus reducing the overall scheduling and data access latencies and leading to significant improvements in the overall turnaround time. We have evaluated the proposed system using a large analysis workflow from High Energy Physics (HEP) by emulating the WLCG infrastructure in a controlled environment. The results have shown that by using speculative and locality aware scheduling techniques, significant improvements (i.e. over 30%) can be achieved in the execution of data intensive workflows in the cloud environment.

History

Author affiliation

College of Science & Engineering Comp' & Math' Sciences

Version

  • AM (Accepted Manuscript)

Published in

Future Generation Computer Systems

Volume

166

Pagination

107622 - 107622

Publisher

Elsevier BV

issn

0167-739X

Copyright date

2024

Available date

2025-11-29

Language

en

Deposited by

Professor Ashiq Anjum

Deposit date

2024-12-27

Usage metrics

    University of Leicester Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC