DimReduce

# Dimensionality Reduction

CS534 - Machine Learning

Yubin Park, PhD

---

"Less is more"

We want to reduce the dimension of data

$$ \mathbf{X} \rightarrow \mathbf{X}^\prime $$

where `$\text{dim}(\mathbf{X}^\prime) \ll \text{dim}(\mathbf{X})$`

---

with **one condition**

$$ \mathcal{L}(\mathbf{y}, f(\mathbf{X})) \approx \mathcal{L}(\mathbf{y}, f(\mathbf{X}^\prime)) $$

i.e. without losing

much information

from the original data.

---

## Ways to Reduce Dimensions

Many approaches exist:

- Feature Selection
  - through manual analysis
  - through regularization techniques such as Lasso
- Clustering
  - such as [k-means](https://en.wikipedia.org/wiki/K-means_clustering)
- Linear Transformation
  - such as [Principal Component Analysis](https://en.wikipedia.org/wiki/Principal_component_analysis), [Singular Value Decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition)
- Non-linear Transformation
  - such as [Neural-Net Word Embeddings](https://en.wikipedia.org/wiki/Word_embedding)

We will cover the Linear Transformation approach in this lecture.

---

The data seem like distrubuted in 2D, but...

---

If we rotate the coordinates, they are distributed in 1D.

---

## Principal Component Analysis

The rotation in the example is a part of "[linear transformation](https://en.wikipedia.org/wiki/Linear_map)".

We want to find a linear transformation:

$$ f(\mathbf{x}) = \mathbf{\mu} + \mathbf{V}\mathbf{x} $$

that minimizes the Squared Loss function:

$$ \sum (\mathbf{x}_i - f(\mathbf{x}_i) )^T(\mathbf{x}_i - f(\mathbf{x}_i) ) $$

---

## Questions?