INFO70283
Big Data Analytics Tools
Sheridan
 
  I: Administrative Information   II: Course Details   III: Topical Outline(s)  Printable Version
 

Land Acknowledgement

Sheridan College resides on land that has been, and still is, the traditional territory of several Indigenous nations, including the Anishinaabe, the Haudenosaunee Confederacy, the Wendat, and the Mississaugas of the Credit First Nation. We recognize this territory is covered by the Dish with One Spoon treaty and the Two Row Wampum treaty, which emphasize the importance of joint stewardship, peace, and respectful relationships.

As an institution of higher learning Sheridan embraces the critical role that education must play in facilitating real transformational change. We continue our collective efforts to recognize Canada's colonial history and to take steps to meaningful Truth and Reconciliation.


Section I: Administrative Information
  Total hours: 42.0
Credit Value: 3.0
Credit Value Notes: N/A
Effective: Winter 2023
Prerequisites: INFO70282
Corequisites: N/A
Equivalents: INFO70040
Pre/Co/Equiv Notes: Prerequisites: INFO70282-Machine Learning Equivalent: INFO70040-Big Data Tools

Program(s): Data Engineer, Data Science and Artificial In
Program Coordinator(s): N/A
Course Leader or Contact: N/A
Version: 20230109_00
Status: Approved (APPR)

Section I Notes: Access to course materials and assignments will be available on Sheridan's Learning and Teaching Environment (SLATE). Students will need reliable access to a computer and the internet.

 
 
Section II: Course Details

Detailed Description
Students are introduced to popular Big Data tools such as the Hadoop framework and NoSQL databases. Students learn the basic concepts of MapReduce and Python scripting. Through various exercises, students explore widely used software for Big Data like Hive, Pig, and Spark. Important Note: For this course, students require a computer Desktop/Laptop (PC/MAC) with the following minimum requirements: Intel or AMD Processor with Virtualization capabilities (any recent machine with reasonably powerful hardware); 16GB of RAM Memory; 25GB of free Hard Drive Disk Space

Program Context

 
Data Engineer Program Coordinator(s): N/A
This course is part of the Data Engineering Micro-Credential; Data Scientist & Artificial Intelligence Micro-Credential.

Data Science and Artificial In Program Coordinator(s): N/A
This course is part of the Data Scientist & Artificial Intelligence micro-credential


Course Critical Performance and Learning Outcomes

  Critical Performance:
By the end of this course, students will have demonstrated the ability to manipulate data in the Hadoop file system and NoSQL data stores using Big data tools and scripts.
 
Learning Outcomes:

To achieve the critical performance, students will have demonstrated the ability to:

  1. Explore how Big Data Tools are used in the data science process.
  2. Explain how Hadoop stores and processes data.
  3. Query a Hadoop Distributed File System using Python scripts.
  4. Extract data patterns from a Hadoop Distributed File System (HDFS) using big data tools.
  5. Discover how data patterns are extracted from NoSQL data stores using big data tools.
  6. Manipulate data using a NoSQL Database.

Evaluation Plan
Students demonstrate their learning in the following ways:

 Evaluation Plan: ONLINE
 Quizzes (3 x 5%)15.0%
 Assignment 120.0%
 Assignment 215.0%
 Assignment 315.0%
 Assignment 415.0%
 Assignment 520.0%
Total100.0%

Evaluation Notes and Academic Missed Work Procedure:
TEST AND ASSIGNMENT PROTOCOL The following protocol applies to every course offered by Continuing and Professional Studies. 1. Students are responsible for staying abreast of test dates and times, as well as due dates and any special instructions for submitting assignments and projects as supplied to the class by the instructor. 2. Students must write all tests at the specified date and time. Missed tests, in-class/online activities, assignments and presentations are awarded a mark of zero. The penalty for late submission of written assignments is a loss of 10% per day for up to five business days (excluding Sundays and statutory holidays), after which, a grade of zero is assigned. Business days include any day that the college is open for business, whether the student has scheduled classes that day or not. An extension or make-up opportunity may be approved by the instructor at his or her discretion.

Provincial Context
The course meets the following Ministry of Colleges and Universities requirements:


 

Essential Employability Skills
Essential Employability Skills emphasized in the course:

  • Communication Skills - Respond to written, spoken, or visual messages in a manner that ensures effective communication.
  • Critical Thinking & Problem Solving Skills - Use a variety of thinking skills to anticipate and solve problems.
  • Information Management Skills - Analyze, evaluate, and apply relevant information from a variety of sources.
  • Information Management - Locate, select, organize and document information using appropriate technology and information systems.
  • Personal Skills - Manage the use of time and other resources to complete projects.
  • Personal Skills - Take responsibility for one's own actions, decisions, and consequences.

Prior Learning Assessment and Recognition
PLAR Contact (if course is PLAR-eligible) - Office of the Registrar
Students may apply to receive credit by demonstrating achievement of the course learning outcomes through previous relevant work/life experience, service, self-study and training on the job. This course is eligible for challenge through the following method(s):

  • Portfolio
    Notes:  Portfolio is required

 
 
Section III: Topical Outline
Some details of this outline may change as a result of circumstances such as weather cancellations, College and student activities, and class timetabling.
Instruction Mode: Online
Professor: N/A
Resource(s):
 TypeDescription
OptionalTextbookSams Teach Yourself Hadoop in 24 Hours (E-BOOK), Jeffrey Aven, Publisher: Sams, 1st edition, ISBN 9780672338526, 2017

Applicable student group(s): Students in Continuing and Professional Studies.
Course Details:

Module 1: Big Data Tools and Data Science

  • What is Big Data?
  • Types of Big Data Tools
  • History of Big Data and Data Science
  • The data science process

Quiz 1 (5%)

 

Module 2: Storing and Processing data in Hadoop

  • Hadoop Core Components
  • Installing Hadoop in Sandbox
  • Storing data in and interacting with HDFS
  • Processing data with MapReduce
  • The Ambari interface

Quiz 2 (5%)

 

Module 3Python Scripting for Hadoop

  • Installing Python and Anaconda
  • Python Tutorial and Libraries for Data science
  • Hadoop MapReduce with Example
  • Python Code Walkthrough and Testing
  • Running MapReduce Job in Hadoop
  • Components of MapReduce

Assignment 1 (20%)

 

Module 4: Databases, Tables and Queries in Hive

  • Introduction to Hive
  • Tables in Hive
  • Writing queries in Hive
  • Views in Hive
  • HiveQL Indexes and Insert
  • Using Hive in Hadoop Environment

Assignment 2 (15%)

 

Module 5:  Scripting and Loading Data in Pig

  • Introduction to Pig
  • Comparing Pig, MapReduce and Hive
  • Loading and Storing Data in Pig
  • Writing Pig Latin scripts

Assignment 3 (15%)

 

Module 6: Data Manipulation in Spark

  • Introduction to Spark
  • Data Storage in Spark
  • Data manipulation in Spark
  • Python Lambda Function

Assignment 4 (15%)

 

Module 7: Creating and Manipulating Databases in MongoDB

  • NoSQL databases
  • NoSQL and relational databases
  • Introduction to Elasticsearch
  • Introduction to and installing MongoDB
  • Common MongoDB commands
  • Storing Data in MongoDB
  • Querying Data and indexing in MongoDB

Quiz 3 (5%)

Assignment 5 (20%)



Sheridan Policies

It is recommended that students read the following policies in relation to course outlines:

  • Academic Integrity
  • Copyright
  • Intellectual Property
  • Respectful Behaviour
  • Accessible Learning
All Sheridan policies can be viewed on the Sheridan policy website.

Appropriate use of generative Artificial Intelligence tools: In alignment with Sheridan's Academic Integrity Policy, students should consult with their professors and/or refer to evaluation instructions regarding the appropriate use, or prohibition, of generative Artificial Intelligence (AI) tools for coursework. Turnitin AI detection software may be used by faculty members to screen assignment submissions or exams for unauthorized use of artificial intelligence.

Course Outline Changes: The information contained in this Course Outline including but not limited to faculty and program information and course description is subject to change without notice. Nothing in this Course Outline should be viewed as a representation, offer and/or warranty. Students are responsible for reading the Important Notice and Disclaimer which applies to Programs and Courses.


[ Printable Version ]

Copyright © Sheridan College. All rights reserved.