INFO70283
Big Data Analytics Tools
Sheridan
 
  I: Administrative Information   II: Course Details   III: Topical Outline(s)  Printable Version
 
Section I: Administrative Information
  Total hours: 42.0
Credit Value: 3.0
Credit Value Notes: N/A
Effective: Winter 2022
Prerequisites: INFO70282
Corequisites: N/A
Equivalents: INFO70040
Pre/Co/Equiv Notes: Prerequisites: INFO70282-Machine Learning Equivalent: INFO70040-Big Data Tools

Program(s): Data Engineer, Data Science and Artificial In
Program Coordinator(s): N/A
Course Leader or Contact: N/A
Version: 20220110_00
Status: Approved (APPR)

Section I Notes: Access to course materials and assignments will be available on Sheridan's Learning and Teaching Environment (SLATE). Students will need reliable access to a computer and the internet. OntarioLearn students will access this course through ontariolearn.com and will receive access instructions from OntarioLearn.

 
 
Section II: Course Details

Detailed Description
Students are introduced to popular Big Data tools such as the Hadoop framework and NoSQL databases. Students learn the basic concepts of MapReduce and Python scripting. Through various exercises, students explore widely used software for Big Data like Hive, Pig, and Spark. Important Note: For this course, students require a computer Desktop/Laptop (PC/MAC) with the following minimum requirements: Intel or AMD Processor with Virtualization capabilities (any recent machine with reasonably powerful hardware); 16GB of RAM Memory; 25GB of free Hard Drive Disk Space

Program Context

 
Data Engineer Program Coordinator(s): N/A
This course is part of the Data Engineering Micro-Credential; Data Scientist & Artificial Intelligence Micro-Credential.

Data Science and Artificial In Program Coordinator(s): N/A
This course is part of the Data Scientist & Artificial Intelligence micro-credential


Course Critical Performance and Learning Outcomes

  Critical Performance:
By the end of this course, students will have demonstrated the ability to manipulate data in the Hadoop file system and NoSQL data stores using Big data tools and scripts.
 
Learning Outcomes:

To achieve the critical performance, students will have demonstrated the ability to:

  1. Explore how Big Data Tools are used in the data science process.
  2. Explain how Hadoop stores and processes data.
  3. Query a Hadoop Distributed File System using Python scripts.
  4. Extract data patterns from a Hadoop Distributed File System (HDFS) using big data tools.
  5. Discover how data patterns are extracted from NoSQL data stores using big data tools.
  6. Manipulate data using a NoSQL Database.

Evaluation Plan
Students demonstrate their learning in the following ways:

 Evaluation Plan: ONLINE
 Quizzes (3 x 5%)15.0%
 Assignment 120.0%
 Assignment 215.0%
 Assignment 315.0%
 Assignment 415.0%
 Assignment 520.0%
Total100.0%

Evaluation Notes and Academic Missed Work Procedure:
TEST AND ASSIGNMENT PROTOCOL The following protocol applies to every course offered by Continuing and Professional Studies. 1. Students are responsible for staying abreast of test dates and times, as well as due dates and any special instructions for submitting assignments and projects as supplied to the class by the instructor. 2. Students must write all tests at the specified date and time. Missed tests, in-class/online activities, assignments and presentations are awarded a mark of zero. The penalty for late submission of written assignments is a loss of 10% per day for up to five business days (excluding Sundays and statutory holidays), after which, a grade of zero is assigned. Business days include any day that the college is open for business, whether the student has scheduled classes that day or not. An extension or make-up opportunity may be approved by the instructor at his or her discretion.

Provincial Context
The course meets the following Ministry of Colleges and Universities requirements:


 

Essential Employability Skills
Essential Employability Skills emphasized in the course:

  • Communication Skills - Respond to written, spoken, or visual messages in a manner that ensures effective communication.
  • Critical Thinking & Problem Solving Skills - Use a variety of thinking skills to anticipate and solve problems.
  • Information Management Skills - Analyze, evaluate, and apply relevant information from a variety of sources.
  • Information Management - Locate, select, organize and document information using appropriate technology and information systems.
  • Personal Skills - Manage the use of time and other resources to complete projects.
  • Personal Skills - Take responsibility for one's own actions, decisions, and consequences.

Prior Learning Assessment and Recognition
PLAR Contact (if course is PLAR-eligible) - Office of the Registrar
Students may apply to receive credit by demonstrating achievement of the course learning outcomes through previous relevant work/life experience, service, self-study and training on the job. This course is eligible for challenge through the following method(s):

  • Portfolio
    Notes:  Portfolio is required

 
 
Section III: Topical Outline
Some details of this outline may change as a result of circumstances such as weather cancellations, College and student activities, and class timetabling.
Instruction Mode: Online
Professor: N/A
Resource(s):
 TypeDescription
OptionalTextbookSams Teach Yourself Hadoop in 24 Hours (E-BOOK), Jeffrey Aven, Publisher: Sams, 1st edition, ISBN 9780672338526, 2017

Applicable student group(s): Students in Continuing and Professional Studies.
Course Details:

Module 1: Big Data Tools and Data Science

  • What is Big Data?
  • Types of Big Data Tools
  • History of Big Data and Data Science
  • The data science process

Quiz 1 (5%)

 

Module 2: Storing and Processing data in Hadoop

  • Hadoop Core Components
  • Installing Hadoop in Sandbox
  • Storing data in and interacting with HDFS
  • Processing data with MapReduce
  • The Ambari interface

Quiz 2 (5%)

 

Module 3Python Scripting for Hadoop

  • Installing Python and Anaconda
  • Python Tutorial and Libraries for Data science
  • Hadoop MapReduce with Example
  • Python Code Walkthrough and Testing
  • Running MapReduce Job in Hadoop
  • Components of MapReduce

Assignment 1 (20%)

 

Module 4: Databases, Tables and Queries in Hive

  • Introduction to Hive
  • Tables in Hive
  • Writing queries in Hive
  • Views in Hive
  • HiveQL Indexes and Insert
  • Using Hive in Hadoop Environment

Assignment 2 (15%)

 

Module 5:  Scripting and Loading Data in Pig

  • Introduction to Pig
  • Comparing Pig, MapReduce and Hive
  • Loading and Storing Data in Pig
  • Writing Pig Latin scripts

Assignment 3 (15%)

 

Module 6: Data Manipulation in Spark

  • Introduction to Spark
  • Data Storage in Spark
  • Data manipulation in Spark
  • Python Lambda Function

Assignment 4 (15%)

 

Module 7: Creating and Manipulating Databases in MongoDB

  • NoSQL databases
  • NoSQL and relational databases
  • Introduction to Elasticsearch
  • Introduction to and installing MongoDB
  • Common MongoDB commands
  • Storing Data in MongoDB
  • Querying Data and indexing in MongoDB

Quiz 3 (5%)

Assignment 5 (20%)



Sheridan Policies

All Sheridan policies can be viewed on the Sheridan policy website.

Academic Integrity: The principle of academic integrity requires that all work submitted for evaluation and course credit be the original, unassisted work of the student. Cheating or plagiarism including borrowing, copying, purchasing or collaborating on work, except for group projects arranged and approved by the professor, or otherwise submitting work that is not the student's own, violates this principle and will not be tolerated. Students who have any questions regarding whether or not specific circumstances involve a breach of academic integrity are advised to review the Academic Integrity Policy and procedure and/or discuss them with the professor.

Copyright: A majority of the course lectures and materials provided in class and posted in SLATE are protected by copyright. Use of these materials must comply with the Acceptable Use Policy, Use of Copyright Protected Work Policy and Student Code of Conduct. Students may use, copy and share these materials for learning and/or research purposes provided that the use complies with fair dealing or an exception in the Copyright Act. Permission from the rights holder would be necessary otherwise. Please note that it is prohibited to reproduce and/or post a work that is not your own on third-party commercial websites including but not limited to Course Hero or OneNote. It is also prohibited to reproduce and/or post a work that is not your own or your own work with the intent to assist others in cheating on third-party commercial websites including but not limited to Course Hero or OneNote.

Intellectual Property: Sheridan's Intellectual Property Policy generally applies such that students own their own work. Please be advised that students working with external research and/or industry collaborators may be asked to sign agreements that waive or modify their IP rights. Please refer to Sheridan's IP Policy and Procedure.

Respectful Behaviour: Sheridan is committed to provide a learning environment that supports academic achievement by respecting the dignity, self-esteem and fair treatment of every person engaged in the learning process. Behaviour which is inconsistent with this principle will not be tolerated. Details of Sheridan's policy on Harassment and Discrimination, Academic Integrity and other academic policies are available on the Sheridan policy website.

Accessible Learning: Accessible Learning coordinates academic accommodations for students with disabilities. For more information or to register, please see the Accessible Learning website (Statement added September 2016)

Course Outline Changes: The information contained in this Course Outline including but not limited to faculty and program information and course description is subject to change without notice. Any changes to course curriculum and/or assessment shall adhere to approved Sheridan protocol. Nothing in this Course Outline should be viewed as a representation, offer and/or warranty. Students are responsible for reading the Important Notice and Disclaimer which applies to Programs and Courses.


[ Printable Version ]

Copyright © Sheridan College. All rights reserved.