As the Final Year Project of my MEng Electrical and Electronic Engineering
degree from Imperial College London, I developed machine-learning techniques to
monitor and predict resource utilization and availability on a very large
number of computers in cloud computing and distributed environments.
My supervisor for the project was
Professor Kin K. Leung.
My second marker for the project was
Doctor Wei Dai.
This project was part of
the US-UK ITA project.
The project began with a survey of existing techniques available and various
underlying models reported in the literature. Using real datasets, algorithms
were implemented to track the resource usage of many computer servers as a way
to predict resource occupancy and workload on the computers in the near future.
By using the actual resource-usage measurements, the machine-learning
techniques were validated in order to show their effectiveness.
In
my final report,
I first looked at the current state of the distributed and cloud computing
market. I identified the problems the industry is facing today. I went on to
justify better resource occupancy predictions via machine learning as a good
solution to these problems. I used a large dataset provided by Google for my
technical investigations. I conducted exploratory analysis on the dataset to
determine the dynamics of the system. I then identified well-suited prediction
models, implemented them, and compared their performance to some baseline
models.
All the data used for this project has been voluntarily published by Google in
an attempt to "make visible many of the scheduling complexities that affect
Google's workload, including the variety of job types, complex scheduling
constraints on some jobs, mixed hardware types, and user mis-estimation of
resource consumption". The usage trace is located in
a public Google Cloud Platform bucket .
All code related to the project is publicly avaiable as well, at
a GitHub repository.