23rd June 2021 10:00am to 11:30am Topic: Tree Based ML Models
Speaker: Dr. Rekha Ramesh (CSE Department, Shah & Anchor Kutchhi Engineering College)


Dr. Rekha Ramesh, explained about the following:
Tree based machine learning most used supervised learning methods, they map non-linear relationships quite well. They empower predictive models with high accuracy and stability. Some of the important Terminology related to Tree based algorithms are: Root node, Splitting, Decision node, Leaf Terminal node, Pruning, Branch, Parent and child node. Decision Tree split the sample into two or more homogeneous sets based on most significant splitter. Decision Tree Induction Techniques: top-down, recursive and divide-and-conquer approach. Decision Tree Induction is the procedure to choose an attribute and split it into from larger training sets to smaller training sets. Algorithms to perform Decision Tree Induction are: ID3, CART. ID3 algorithm defines a measurement of a splitting called Information Gain to figure out the goodness of a split. The attribute with largest value of information gain is chosen as the splitting attribute. CART (Classification and Regression Tree) is a technique that generates a binary attribute to the split. CART uses Gini Index as a measurement to select the best attribute to be split. The attribute that maximizes the reduction in impurity is selected for the attribute to be split. The CART algorithm has an attribute with more than two values, there are two cases to solve this: Discrete valued attributes and Continuous valued attributes. Data Fragmentation Problem- to deal with this problem, further splitting can be stopped when the number of records falls below a certain threshold.


23rd june 2021 11:45am – 1:15pm Topic: Tree based Ensemble Methods
Speaker: Dr. Rekha Ramesh (CSE Department, Shah & Anchor Kutchhi Engineering College)


Dr. Rekha Ramesh explained about the following:
Simple Ensemble Learning Techniques – Max Voting, Averaging, Weight Averaging. Advanced Ensemble Learning Techniques – Stacking, Bootstrapping, Bagging, Random Forest. Random forest: Hypermeters – At tree level, For each Tree. Selecting Boosting: Reason – Models are dependent, Correct the errors of the earlier models and build sequentially. Removing Bagging: Reason – Independent Trees, does not consider the incorrect predictions and new trees do not learn from the earlier models. Gradient Boosting: Models are built sequentially and created over residuals. XGBoost Algorithm is an eXtreme Gradient Boosting Algorithm. Also, later trees focus on reducing the error and it is designed for speed and performance. XGBoost Algorithm includes: Parallel Processing, Regularization, Handling Missing Values, Out of Core Computing, and Built-in Cross Validation. AdaBoost-Adaptive Boosting: Higher weights are assigned to data points which are incorrectly predicted. CatBoost: It can automatically deal with categorical variables and does not require extensive data processing like another machine leaning algorithm. Random Forest: It creates multiple Bootstrap samples, also build a decision tree on every sample, uses feature sampling for each split in decision tree and aggregate all decision trees


23rd June 2021 2:30pm to 4:30pm
Topic: Practical session on Tree based models
Speaker: Atul Haribhau Kachare (Research Scholar at sir padampat Singhania university)


Atul Haribhau Kachare explained about the following:
Tree based models: Supervised Learning, Categorical & Continuous Data, Adaptable. Decision trees are a series of sequential steps designed to answer a question and supply probabilities, costs, or other consequence of making decision. Random forest: It is a collection of decision trees with a single aggregated result. Random forest reduces the variance seen in decision trees by: using different samples for training, specifying random feature subsets and building and combining small(shallow) trees. Random tree combines the results at the end of the process. Extreme Gradient Boosting: It builds one tree at a time. Extreme Gradient Boosting works in a forward stage-wise manner, introducing a weak learner to improve the shortcomings of existing weak learners. Gradient boosting combines results along the way.
Practical implementation on Decision tree-IRIS, random forest-balance scale, Extreme gradient boosting-balance sheet and Ice cream Revenue (Combines all regression models to compare)