Final Project/Report

  1. Shelter Animal Outcomes Presentation/Code (fin):祥, ,,

  2. House Price (fin): 田兆元, 簡子軒, 曾品華, 高淑婷, 江嘉容
  3. Bosch Production Line Performance: 林子軒
  4. ML of IMDb Movies evaluation (ppt):  林楷文, 梁智傑, 馮獻慶, 林修禾

XGBoost, GBM and boosted trees

XGBoost

 

A recent (June 2017) FB post of Yuan-Chun Ivan Chang

今天的討論,我突然發覺有些“只是會用現成工具,卻完全不知其所以然”的人大有人在。甚至只用套件中設定的參數,而完全不懂那些參數的意義。我要如何才能確認這些人是真的知道他們在做什麼?學生的話,我可以直接問,別人的學生我怎麼問呢?問了好像也在問他的老師,不是嗎?

“Off-the-shelf” machines

Refer to HTF2009, Section 10.7: “Off-the-shelf” procedures for Data Mining

Also Section 10.8: Spam Data. Pay attention to performance comparisons of these machines.

  • Two-sample t-test:  Why?
  • MeNemar test

Homework: Check 5/22

  • Exercise 10.6
  • Construct/Redo Table 10.1 using your data and your favorite machines. Prepare a short presentation (10 min or so) based on your new Table 10.1 and performance evaluation/comparison similar to Section 10.8

A Learning Path of (S)ML via R/Python

VS

  • Learning R or Python
  • Learning Statistics, Machine Learning, Linear Algebra, etc
  • 同時學習 vs. 螺旋學習

適合者: 會些統計(如迴歸等),會些語言(日本語 如 C++)

The Path

Read (foundation: books, papers)

View (slides, docs)

Play & Hack

* Diagnosis and remedial measures are needed for sound GLM (regression, ANOVA, ANCOVA) statistical analysis, particularly for modern high dimensional data (small n, large p)

把Python當R用

User Background

  • Tasks
    • Statistical data analysis: general linear models (regression, ANOVA, ANCOVA), generalized linear models (logistic regression), PCA, Multidimensional Scaling, simulation
    • numerical analysis (numerical integration, optimization)
    • Statistical machine learning: FDA, Boosting variants, SVM, Random forests
  • R core: R (download 0-cloud)
  • R proc and packages: e.g. glm, glmnet, ada, GAMboost, e1071, randomforest, rpart.  Search MRO/package
  • IDE: R , Rstudio (IDE for R)

Why Python?

  • Computational musicology: Miditool box (matlab),   few R packages   —> music21
  • Kaggle: Kernels
  • Opensource +1: Linux, libraoffice, LaTeX, etc

Python <~ Ruser

  • Core: Python 2.7.x or Python 3.5.2
  • IDE: Jupyter, Spider
  • packages : PyPI
  • 把Python當R用

Example: DeepBach

Lesson: 邊學邊做,邊做邊學