My own cloud for large scale data science
Cloud infrastructures provide various types of services, among them,
resources for parallel and distributed computing and storage. In this
assignment, your job is to build an infrastructure that can provide
the following services:
- User registration, secure login and account management
- Provision of computational resources that is elastic (shrinks
and grows as needed)
- Disk storage
- Container service
- Database service
Your implementation should implement some reasonable User Level
Agreement, so that jobs submitted by the user can run in a reasonable
time with a near optimal number of devices. It would also be helpful
if the implementation saves energy as much as possible.
You can deploy your cloud services using Eucalyptus or Open Nebula or can
implement your own services. If you use Eucalyptus, it is recommended
to download a faststart
version to inspect and test before performing
a full installation (more details can be
found here). If
you use Open Nebula, you can
start here.
Suggested structure for the report:
- Brief background on the chosen software and libraries
- Materials and methods
- Machines used and their characteristics
- Software used and their versions
- Detailed guide of how the cloud was developed
- Installation steps
- Selected tests
- Evaluation/assessment metrics
- Discussion and conclusions
Deadline: June 1st, 2024