UC info
Large Scale Data Science
Code: | CC3047 | Acronym: | CC3047 | Level: | 300 |
Keywords | |
---|---|
Classification | Keyword |
OFICIAL | Computer Science |
Instance: 2023/2024 - 2S
Active? | Yes |
Web Page: | https://moodle2324.up.pt/course/view.php?id=2125 |
Responsible unit: | Department of Computer Science |
Course/CS Responsible: | Bachelor in Artificial Intelligence and Data Science |
Cycles of Study/Courses
Acronym | No. of Students | Study Plan | Curricular Years | Credits UCN | Credits ECTS | Contact hours | Total Time |
---|---|---|---|---|---|---|---|
L:BIOINF | 0 | Official Study Plan | 3 | - | 6 | 48 | 162 |
L:CC | 17 | study plan from 2021/22 | 3 | - | 6 | 48 | 162 |
L:IACD | 59 | study plan from 2021/22 | 3 | - | 6 | 48 | 162 |
Teaching language
Portuguese and englishObjectives
Introduction to the use of cloud computing infrastructures for processing massive amounts of data ("big data") in real-world problems.Learning outcomes and competences
- Use of cloud computing services for big data applications.- Programming big data applications using cloud programming models.
- Understanding of core fundaments and algorithms for mining big data.
- Hands-on practice with state-of-the-art tools for cloud computing and big data.
Working method
PresencialProgram
- Introduction to big data processing: challenges, example problems from science and business.
- The cloud computing paradigm: service models (PaaS, SaaS, IaaS); service virtualization, deployment and orchestration; integration of computing, networking and storage resources; scalability, fault-tolerance, and “elasticity”.
- Cloud storage solutions for big data: cloud file systems, NoSQL and graph-based databases, “object stores”.
- High-performance big data applications using cloud programming models: MapReduce, stream-based programming.
- Programming assignments on big data applications on specific topics such as data streams, social-network graphs, recommendation systems, or bioinformatics.Mandatory literature
Ian Foster and Dennis B. Gannon; Cloud Computing for Science and Engineering, MIT Press, 2017. ISBN: 978-0262037242Complementary Bibliography
Tom White; Hadoop, The Definitive Guide, 4th edition, O'Reilly Media, 2015. ISBN: 978-1491901632N. Marz and J. Warren; Big Data: Principles and best practices of scalable realtime data systems,, Manning Publications, 2015. ISBN: 978-1617290343
Dan C. Marinescu; Cloud Computing - Theory and Practice, 2nd edition, Morgan Kaufmann, 2018. ISBN: 978-0-12-812810-7
Jure Leskovec, Anand Rajaraman, Jeff Ullman ; Mining of Massive Datasets, Cambridge University Press, 2014. ISBN: 978-1107077232 (Available free in PDF format by the authors at http://mmds.org)
M. Zaharia and B. Chambers; Spark: The Definitive Guide - Big Data Processing Made Simple, O'Reilly, 2018. ISBN: 978-1491912218
Teaching methods and learning activities
- Introduction of cloud computing technologies in tandem with big data application requirements.
- Hands-on practice in programming projects using tools by major cloud service providers (Amazon Web Services, Microsoft Azure, Google Cloud, etc) and DCC computer clusters for MapReduce.
Evaluation Type
Distributed evaluation with final examAssessment Components
designation | Weight (%) |
---|---|
Trabalho prático ou de projeto | 40,00 |
Teste | 60,00 |
Total: | 100,00 |
Amount of time allocated to each course unit
designation | Time (hours) |
---|---|
Elaboração de projeto | 52,00 |
Frequência das aulas | 52,00 |
Estudo autónomo | 58,00 |
Total: | 162,00 |
Eligibility for exams
--Calculation formula of final grade
The grade is determined by two evaluation:- 1 exam (60%) (12 "valores)
- 1 project assignment (TP) with a weight of 40% (8 "valores")
- mini- assignments along the semester
To be approved: final grade greater or equal than 10 AND grade in the exam of at least 30% (>=3.6 "valores")
Classification improvement
The grade can be improved in the final exam.The grade of project assignments cannot be improved.