DME - Data Mining and Exploration (INFR 11007) Review
This is my review note of the DME course (Data Mining and Exploration (INFR11007), 2019) at the University of Edinburgh. The note include every steps to develop machine learning models and related knowledge, e.g., Exploratory Data Analysis (EDA), Data Preprocessing, Modeling and Model Evaluations. Remeber to read the ‘Lab’ section of each chapter
1. Exploratory Data Analysis
1.1 Numberical Data Description
1.1.1 Location
Non-robust Measure
- Sample Mean (arithmetic mean or average): $\hat{x} = \frac{1}{n}\sum_{i=1}^{n} x_{i}$
- for random variable: $\mathbb{E}[x] = \int xp(x) dx$
- Sample Mean (arithmetic mean or average): $\hat{x} = \frac{1}{n}\sum_{i=1}^{n} x_{i}$
Robust Measure
Median:
$$ median(x) = \begin{cases} x_{[(n+1)\mathbin{/}2]}& \text{; if $n$ is odd}\\ \frac{1}{2}[x_{(n\mathbin{/}2)}+x_{(n\mathbin{/}2)+1}]& \text{; if $n$ is even} \end{cases} $$Mode: Value that occurs most frequent
- $\alpha_{th}$ Sample Quantile (rough data point, i.e. $q_{\alpha} \approx x_{([n\alpha])}$)
- $Q_{1} = q_{0.25}$, $Q_{2} = q_{0.5}$, $Q_{3} = q_{0.75}$