DME - Data Mining and Exploration (INFR 11007) Review

This is my review note of the DME course (Data Mining and Exploration (INFR11007), 2019) at the University of Edinburgh. The note include every steps to develop machine learning models and related knowledge, e.g., Exploratory Data Analysis (EDA), Data Preprocessing, Modeling and Model Evaluations. Remeber to read the ‘Lab’ section of each chapter


Data Analysis Process

1. Exploratory Data Analysis

1.1 Numberical Data Description

1.1.1 Location

  • Non-robust Measure

    • Sample Mean (arithmetic mean or average): $\hat{x} = \frac{1}{n}\sum_{i=1}^{n} x_{i}$
      • for random variable: $\mathbb{E}[x] = \int xp(x) dx$
  • Robust Measure

    • Median:

      $$ median(x) = \begin{cases} x_{[(n+1)\mathbin{/}2]}& \text{; if $n$ is odd}\\ \frac{1}{2}[x_{(n\mathbin{/}2)}+x_{(n\mathbin{/}2)+1}]& \text{; if $n$ is even} \end{cases} $$
    • Mode: Value that occurs most frequent

    • $\alpha_{th}$ Sample Quantile (rough data point, i.e. $q_{\alpha} \approx x_{([n\alpha])}$)
      • $Q_{1} = q_{0.25}$, $Q_{2} = q_{0.5}$, $Q_{3} = q_{0.75}$

Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×