SourMiLK

Final Project/Report

June 7, 2017June 15, 2017 KnoLeave a comment

Shelter Animal Outcomes Presentation/Code (fin):林子祥, 簡子軒,潘星丞,林唯德
House Price (fin): 田兆元, 簡子軒, 曾品華, 高淑婷, 江嘉容
Bosch Production Line Performance: 林子軒
ML of IMDb Movies evaluation (ppt): 林楷文, 梁智傑, 馮獻慶, 林修禾

XGBoost, GBM and boosted trees

June 5, 2017June 5, 2017 KnoLeave a comment

XGBoost

A gentle introduction to XGBoost by Brownlee contains many informative links.

A recent (June 2017) FB post of Yuan-Chun Ivan Chang

今天的討論，我突然發覺有些“只是會用現成工具，卻完全不知其所以然”的人大有人在。甚至只用套件中設定的參數，而完全不懂那些參數的意義。我要如何才能確認這些人是真的知道他們在做什麼？學生的話，我可以直接問，別人的學生我怎麼問呢？問了好像也在問他的老師，不是嗎？

A glance of DL

June 1, 2017June 1, 2017 KnoLeave a comment

ROC, AUC, pAUC

May 23, 2017May 23, 2017 KnoLeave a comment

Criteria

Training/Testing Error: Convenient but rough
Confusion matrix/Contingency Table: Better but restrictive to discrete classifiers (or probabilistic/score classifers of given thresholds)
ROC
- Interpreting diagnostic tests (ROC curve)
- Fawcett: ROC101, Intro to ROC curve

“Off-the-shelf” machines

May 15, 2017May 15, 2017 KnoLeave a comment

Refer to HTF2009, Section 10.7: “Off-the-shelf” procedures for Data Mining

Also Section 10.8: Spam Data. Pay attention to performance comparisons of these machines.

~~Two-sample t-test:~~ Why?
MeNemar test

Homework: Check 5/22

Exercise 10.6
Construct/Redo Table 10.1 using your data and your favorite machines. Prepare a short presentation (10 min or so) based on your new Table 10.1 and performance evaluation/comparison similar to Section 10.8

BD Prediction: Drive 1

April 27, 2017April 27, 2017 KnoLeave a comment

R Notebook: Testdrive 0

Homework/Discussion 1

April 13, 2017May 11, 2017 KnoLeave a comment

HTF 2009: Exercise 2.1, 2.8; 3.3(a), 3.30; 4.2, 4.3, 4.9
Cehckpoint 4.27(Thr)

Github Link

A Learning Path of (S)ML via R/Python

March 13, 2017March 15, 2017 KnoLeave a comment

VS

Learning R or Python
Learning Statistics, Machine Learning, Linear Algebra, etc
同時學習 vs. 螺旋學習

適合者：會些統計（如迴歸等），會些語言（日本語如 C++）

The Path

Read (foundation: books, papers)

Textbook: Hastie, Tibshirani and Friedman (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition. (aka. ESLII) Springer-Verlag.
An Introduction to Statistical Learning with Applications in R (aka. ISLR), Markham’s summary @ R-bloggers; Intro to linear regression (python).*, Warmenhoven’s repo for ISLR python codes

View (slides, docs)

Ensemble Learning (Attention to p7-p10, regression as a learning machine)

Play & Hack

Playgrounds: Kaggle; CrowdAI, DrivenData, CrowdAnalytix (3 Kaggle alternatives)
More hacks: 25+ websites to find datasets for data science projects
Path suggested by Analytic Vidhya

* Diagnosis and remedial measures are needed for sound GLM (regression, ANOVA, ANCOVA) statistical analysis, particularly for modern high dimensional data (small n, large p)

把Python當R用

February 20, 2017March 13, 2017 KnoLeave a comment

User Background

Tasks
- Statistical data analysis: general linear models (regression, ANOVA, ANCOVA), generalized linear models (logistic regression), PCA, Multidimensional Scaling, simulation
- numerical analysis (numerical integration, optimization)
- Statistical machine learning: FDA, Boosting variants, SVM, Random forests
R core: R (download 0-cloud)
R proc and packages: e.g. glm, glmnet, ada, GAMboost, e1071, randomforest, rpart. Search MRO/package
IDE: R , Rstudio (IDE for R)

Why Python?

Computational musicology: ~~Miditool box (matlab),~~ few R packages —> music21
Kaggle: Kernels
Opensource +1: Linux, libraoffice, LaTeX, etc

Python <~ Ruser

Core: Python 2.7.x or Python 3.5.2
IDE: Jupyter, Spider
packages : PyPI
把Python當R用

Example: DeepBach

Project/Problem oriented: DeepBach: a Steerable Model for Bach chorales generation
Github: Ghadjeres/DeepBach
Result： musicscore

Lesson: 邊學邊做，邊做邊學

The best of times, the worst of times

February 12, 2017February 16, 2017 KnoLeave a comment

SourMiLK

Final Project/Report

XGBoost, GBM and boosted trees

A glance of DL

[DSC 2016] 系列活動：李宏毅 / 一天搞懂深度學習

深入淺出DeepLearning (Vivian Chen)

Deep Learning Summer School, Montreal 2015

Deep Learning Book

ROC, AUC, pAUC

Criteria

“Off-the-shelf” machines

BD Prediction: Drive 1

Homework/Discussion 1

A Learning Path of (S)ML via R/Python

VS

適合者：會些統計（如迴歸等），會些語言（日本語如 C++）

The Path

Read (foundation: books, papers)

View (slides, docs)

Play & Hack

把Python當R用

User Background

Why Python?

Python <~ Ruser

Example: DeepBach

Lesson: 邊學邊做，邊做邊學

The best of times, the worst of times

Criteria

VS

適合者： 會些統計（如迴歸等），會些語言（日本語 如 C++）

The Path

Read (foundation: books, papers)

View (slides, docs)

Play & Hack

User Background

Why Python?

Python <~ Ruser

Example: DeepBach

Lesson: 邊學邊做，邊做邊學

適合者：會些統計（如迴歸等），會些語言（日本語如 C++）