DME - Data Mining and Exploration (INFR 11007) Review
This is my review note of the DME course (Data Mining and Exploration (INFR11007), 2019) at the University of Edinburgh. The note include every steps to develop machine learning models and related knowledge, e.g., Exploratory Data Analysis (EDA), Data Preprocessing, Modeling and Model Evaluations. Remeber to read the ‘Lab’ section of each chapter
![Data Analysis Process Data Analysis Process](/img/DME/DME_process.png)
1. Exploratory Data Analysis
1.1 Numberical Data Description
1.1.1 Location
Non-robust Measure
- Sample Mean (arithmetic mean or average): $\hat{x} = \frac{1}{n}\sum_{i=1}^{n} x_{i}$
- for random variable: $\mathbb{E}[x] = \int xp(x) dx$
- Sample Mean (arithmetic mean or average): $\hat{x} = \frac{1}{n}\sum_{i=1}^{n} x_{i}$
Robust Measure
Median:
$$ median(x) = \begin{cases} x_{[(n+1)\mathbin{/}2]}& \text{; if $n$ is odd}\\ \frac{1}{2}[x_{(n\mathbin{/}2)}+x_{(n\mathbin{/}2)+1}]& \text{; if $n$ is even} \end{cases} $$Mode: Value that occurs most frequent
- $\alpha_{th}$ Sample Quantile (rough data point, i.e. $q_{\alpha} \approx x_{([n\alpha])}$)
- $Q_{1} = q_{0.25}$, $Q_{2} = q_{0.5}$, $Q_{3} = q_{0.75}$