Introduction to Machine Learning for Data Science Seminar

Instructor : Dr. Subhankar Dhar 

Why should I take this seminar?

Most of the data science problems are solved using machine learning. The seminar consists of a series of modules, aims to provide much-needed machine learning skills by introducing ML concepts to students early in their academic careers through extracurricular coursework. Each seminar includes optional pre-tutorial materials that provide a good introduction to the topic, a hands-on tutorial with minimal knowledge of Python, and a post-tutorial assignment that they can complete attesting to their new skills. Each module will be designed as a stand-alone unit with exercises and hands-on labs to gain real-world experience.

Prior knowledge of computer programming will be useful but not necessary. Data science professionals are in high demand in the industry today and this seminar will be helpful for your professional career. 

Seminar objectives

This seminar will cover introductory concepts of machine learning including supervised and unsupervised learning. By attending this seminar, you will

  • Learn and apply fundamental machine learning concepts to solve real-world problems in Data Science
  • Apply ML tools and various Python libraries to get hands-on programming experience
  • Understand the skills necessary and step by step process for machine learning
  • Understand various machine learning techniques and their applications, analyze regression, and classification problems 
  • Learn supervised learning, linear regression analysis using Python libraries - Numpy, Pandas, Sklearn
  • Learn unsupervised learning, solving classification problems using K-Means clustering in Python
  • Analyze the accuracy of machine learning models using commonly used loss functions

Seminar structure

This seminar has three parts: pre-seminar, live seminar, and post-seminar. You can earn a digital badge attesting your skills by successfully completing the post seminar assignment.

Seminar description

The seminar introduces the fundamental concepts of machine learning and delves into different techniques needed to solve data science problems. It also covers important topics like supervised learning, unsupervised learning along with relevant computer programming fundamentals, in conjunction with hands-on analysis of real-world datasets. It consists of several examples developed using Jupyter Notebook and commonly used Python libraries. The seminar will have several lab exercises to gain hands-on experience.

Seminar materials

Pre-seminar

The following website gives you a very good introduction to machine learning:
Introduction to Machine Learning for Beginners

We will be using Google Colaboratory  Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud. All you need is Internet access and a browser. With Colaboratory you can write and execute code, save and share your analysis, and access powerful computing resources, all for free from your browser. 

Seminar:

Slides will be provided prior to the seminar on Canvas

 

Datasets: 

1. We will use Boston Housing data which is available in sklearn Python module.

  • The dataset has 506 rows and 14 columns.
  • It is the price of houses in various places in Boston.
  • Alongside, the dataset also provides information such as Crime (CRIM), areas of non-retail business in the town (INDUS), the age of people who own the house(AGE), Room per dwelling(RM) and many other attributes.

2. Download the Mall Customer Dataset

  • The Mall dataset is an unlabeled dataset.
  • There is no output variable as such so we cluster the data to find some pattern.
  • Here, the problem is to segment customers according to their annual income and spending habits and apply new marketing strategies catered to a particular segment of customers in Mall.