Large Scale Data Science

Code: CC3047     Acronym: CC3047     Level: 300

Keywords
Classification Keyword
OFICIAL Computer Science

Instance: 2023/2024 - 2S

Active? Yes
Web Page: https://moodle2324.up.pt/course/view.php?id=2125
Responsible unit: Department of Computer Science
Course/CS Responsible: Bachelor in Artificial Intelligence and Data Science

Cycles of Study/Courses

Acronym No. of Students Study Plan Curricular Years Credits UCN Credits ECTS Contact hours Total Time
L:BIOINF 0 Official Study Plan 3 - 6 48 162
L:CC 17 study plan from 2021/22 3 - 6 48 162
L:IACD 59 study plan from 2021/22 3 - 6 48 162

Teaching language

Portuguese and english

Objectives

Introduction to the use of cloud computing infrastructures for processing massive amounts of data ("big data") in real-world problems.

Learning outcomes and competences

- Use of cloud computing services  for big data applications.
- Programming big data applications using cloud programming models.
- Understanding of core fundaments and algorithms for mining big data.
- Hands-on practice with state-of-the-art tools for cloud computing and big data.

Working method

Presencial

Program

- Introduction to big data processing: challenges, example problems from science and business.

- The cloud computing paradigm: service models (PaaS, SaaS, IaaS); service virtualization, deployment and orchestration; integration of computing, networking and storage resources; scalability, fault-tolerance, and “elasticity”.

- Cloud storage solutions for big data: cloud file systems, NoSQL and graph-based databases, “object stores”.

- High-performance big data applications using cloud programming models: MapReduce, stream-based programming.

- Programming assignments on big data applications on specific topics such as data streams, social-network graphs, recommendation systems, or bioinformatics.

Mandatory literature

Ian Foster and Dennis B. Gannon; Cloud Computing for Science and Engineering, MIT Press, 2017. ISBN: 978-0262037242

Complementary Bibliography

Tom White; Hadoop, The Definitive Guide, 4th edition, O'Reilly Media, 2015. ISBN: 978-1491901632
N. Marz and J. Warren; Big Data: Principles and best practices of scalable realtime data systems,, Manning Publications, 2015. ISBN: 978-1617290343
Dan C. Marinescu; Cloud Computing - Theory and Practice, 2nd edition, Morgan Kaufmann, 2018. ISBN: 978-0-12-812810-7
Jure Leskovec, Anand Rajaraman, Jeff Ullman ; Mining of Massive Datasets, Cambridge University Press, 2014. ISBN: 978-1107077232 (Available free in PDF format by the authors at http://mmds.org)
M. Zaharia and B. Chambers; Spark: The Definitive Guide - Big Data Processing Made Simple, O'Reilly, 2018. ISBN: 978-1491912218

Teaching methods and learning activities

- Introduction of cloud computing technologies in tandem with big data application requirements.

- Hands-on practice in programming projects using tools by major cloud service providers (Amazon Web Services, Microsoft Azure, Google Cloud, etc) and DCC computer clusters for MapReduce.

 

Evaluation Type

Distributed evaluation with final exam

Assessment Components

designation Weight (%)
Trabalho prático ou de projeto 40,00
Teste 60,00
Total: 100,00

Amount of time allocated to each course unit

designation Time (hours)
Elaboração de projeto 52,00
Frequência das aulas 52,00
Estudo autónomo 58,00
Total: 162,00

Eligibility for exams

--

Calculation formula of final grade

The grade is determined by two evaluation:


  • 1 exam (60%) (12 "valores)

  • 1 project assignment  (TP) with a weight of 40% (8 "valores")

  • mini- assignments along the semester



To be approved: final grade greater or equal than 10 AND grade in the exam of at least 30% (>=3.6 "valores")

Classification improvement

The grade can be improved in the final exam.

The grade of project assignments cannot be improved.