ML basics - Decision Theory & Linear Regression

6 minute read

Deep Learning Model’s Outcome is the Probability of the Variable X

Decision Theory

μƒˆλ‘œμš΄ κ°’ \(\mathbf x\)κ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ ν™•λ₯ λͺ¨λΈ \(p(\mathbf x,\mathbf t)\)에 κΈ°λ°˜ν•΄ 졜적의 κ²°μ •(ex. λΆ„λ₯˜)을 λ‚΄λ¦¬λŠ” 것.
좔둠단계: κ²°ν•©ν™•λ₯ λΆ„포 \(p(\mathbf x, C_{k})\)λ₯Ό κ΅¬ν•˜λŠ” 것 (\(p(C_{k}\mid\mathbf x)\)). μ΄κ²ƒλ§Œ 있으면 λͺ¨λ“  것을 ν•  수 있음
결정단계: 상황에 λŒ€ν•œ ν™•λ₯ μ΄ μ£Όμ–΄μ‘Œμ„ λ•Œ μ–΄λ–»κ²Œ 졜적의 결정을 내릴 것인지? 좔둠단계λ₯Ό κ±°μ³€λ‹€λ©΄ κ²°μ •λ‹¨κ³„λŠ” 맀우 쉽닀.

예제: X-Ray μ΄λ―Έμ§€λ‘œ μ•” νŒλ³„

  • \(\mathbf x\): X-Ray 이미지
  • \(C_{1}\): 암인 경우
  • \(C_{2}\): 암이 μ•„λ‹Œ 경우
  • \(p(C_{k}\mid\mathbf x)\)의 값을 μ•ŒκΈ° 원함
\[\begin{align} p(C_{k}\mid\mathbf x) &= \frac{ p(\mathbf x, C_{k}) }{p(x)} \\\\ &= \frac{ p(\mathbf x, C_{k}) }{ \sum_{k=1}^{2}\ p(\mathbf x, C_{k}) } \\\\ &= \frac{ p(\mathbf x\mid C_{k})\ p(C_{k}) }{p(\mathbf x)} \\\\ &\propto Likelihood \times\ Prior \\\\ \end{align}\]

μ§κ΄€μ μœΌλ‘œ λ³Ό λ•Œ \(p(C_{k}\mid\mathbf x)\)λ₯Ό μ΅œλŒ€ν™” μ‹œν‚€λŠ” kλ₯Ό κ΅¬ν•˜λŠ” 것이 쒋은 κ²°μ •

Binary Classification

Decision Region

\(\mathcal{R}_{i} = \{x:pred(x) = C_{i}\}\)

\(x\)κ°€ \(C_{i}\) 클래슀둜 할당을 ν•˜κ²Œ 되면(ν˜Ήμ€ λΆ„λ₯˜λ₯Ό ν•˜κ²Œ 되면) \(x\)λŠ” \(\mathcal{R}_{i}\)에 μ†ν•˜κ²Œ λœλ‹€.
각각의 \(\mathcal{R}_{i}\)λŠ” \(x\)의 집합이라고 λ³Ό 수 μžˆλ‹€.
클래슀 \(i\)에 μ†ν•˜λŠ” λͺ¨λ“  \(x\)의 집합.

Prob of Misclassification

\(\begin{align} p(mis) &= p(x \in \mathcal{R}_{1}, C_{2}) + p(x \in \mathcal{R}_{2}, C_{1}) \\\\ &= \int_{\mathcal{R}_{1}}\ p(x, C_{2})dx + \int_{\mathcal{R}_{2}}\ p(x, C_{1})dx \\\\ \end{align}\)

\(p(x \in \mathcal{R}_{1}, C_{2})\): class 1으둜 λΆ„λ₯˜λ₯Ό ν–ˆμ§€λ§Œ μ‹€μ œλ‘œλŠ” class 2인 ν™•λ₯ 
\(p(x \in \mathcal{R}_{2}, C_{1})\): class 2으둜 λΆ„λ₯˜λ₯Ό ν–ˆμ§€λ§Œ μ‹€μ œλ‘œλŠ” class 1인 ν™•λ₯ 

이것을 μ λΆ„μ˜ ν˜•νƒœλ‘œ λ‚˜νƒœλ‚Έ 것이 \(\int_{\mathcal{R}_{1}}\ p(x, C_{2})dx + \int_{\mathcal{R}_{2}}\ p(x, C_{1})dx\)

\(\int_{\mathcal{R}_{1}}\ p(x, C_{2})dx\)의 areaλŠ” κ·Έλž˜ν”„μ—μ„œ 빨간색과 μ΄ˆλ‘μƒ‰μœΌλ‘œ 칠해진 면적이닀.
\(\int_{\mathcal{R}_{2}}\ p(x, C_{1})dx\)의 areaλŠ” κ·Έλž˜ν”„μ—μ„œ λ³΄λΌμƒ‰μœΌλ‘œ 칠해진 면적이닀.

κ²°κ΅­ λΆ„λ₯˜μ˜€λ₯˜ ν™•λ₯ μ€ 색이 칠해진 면적의 총합이라고 λ³Ό 수 μžˆλ‹€.
lambda

Minimize Misclassification

\(\hat x\)κ°€ μ™Όμͺ½μœΌλ‘œ μ΄λ™ν•œλ‹€λ©΄,
μ—λŸ¬λ₯Ό λ§Œλ“€μ–΄λ‚΄λŠ” 뢀뢄에 μžˆμ–΄μ„œ λ³€ν•˜λŠ” μ˜μ—­κ³Ό λ³€ν•˜μ§€ μ•ŠλŠ” μ˜μ—­μ΄ μžˆλ‹€.
빨간색 μ˜μ—­μ€ \(\hat x\)이 μ™Όμͺ½μœΌλ‘œ 이동함에 따라 μ€„μ–΄λ“€μ—ˆκ³ , λ‚˜λ¨Έμ§€ μ˜μ—­λ“€μ€ λ³€ν•˜μ§€ μ•ŠλŠ”λ‹€.
빨간색 μ˜μ—­μ„ μ΅œμ†Œν™”μ‹œν‚€λ©΄ 전체 μ—λŸ¬ μ˜μ—­μ΄ μ΅œμ†Œν™”λ  것이닀.

\(\hat x\)이 \(x_{0}\)값을 κ°€μ§€λŠ” μ˜μ—­μ—μ„œλŠ” 빨간색 μ˜μ—­μ΄ μ™„μ „νžˆ 사라지고 μ΅œμ†Œν™”κ°€ λœλ‹€.

였λ₯˜λ₯Ό μ΅œμ†Œν™”ν•˜κΈ°μœ„ν•΄ \(p(x, C_{1}) > p(x, C_{2})\)이 되면 \(x\)λ₯Ό \(\mathcal{R}_{1}\)에 ν• λ‹Ήν•΄μ•Ό ν•œλ‹€.

\(p(x, C_{1}) < p(x, C_{2})\)이 되면 C1에 ν• λ‹Ήν•˜λŠ” 것이 μ•„λ‹ˆλΌ C2에 ν• λ‹Ήν•˜κ²Œ 되면 였λ₯˜λ₯Ό μ΅œμ†Œν™” ν•  수 μžˆλ‹€.

\[\begin{align} p(x, C_{1}) > p(x, C_{2}) &\Leftrightarrow p(C_{1}\mid x)p(x) > p(C_{2}\mid x)p(x) \\\\ &\Leftrightarrow p(C_{1}\mid x) > p(C_{2}\mid x) \\\\ \end{align}\]

lambda

Multiclass

multiclass의 경우 였λ₯˜λ³΄λ‹€ 정확성에 μ΄ˆμ μ„ λ§žμΆ”λŠ” 것이 μ’‹λ‹€.

\[\begin{align} p(correct) &= \sum_{k=1}^{K}p(\mathbf x \in \mathcal{R_{k}}, \mathcal{C_{k}}) \\\\ &= \sum_{k=1}^{K}\int_{\mathcal{R_{k}}}p(\mathbf x, \mathcal{C_{k}})dx \\\\ \end{align}\] \[pred(x) = \arg\max_{k}p(C_{k}\mid x)\]

Objective of Decision Theory (Classification)

κ²°ν•©ν™•λ₯ λΆ„포 \(p(\mathbf x, C_{k})\)κ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ 졜적의 κ²°μ •μ˜μ—­λ“€ \(\mathcal{R_{1}},...,\mathcal{R_{K}}\)λ₯Ό μ°ΎλŠ” 것
\(\hat C(\mathbf x)\)λ₯Ό \(\mathbf x\)κ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ μ˜ˆμΈ‘κ°’ \((1,...,K 쀑 ν•˜λ‚˜μ˜ κ°’)\)을 λŒλ €μ£ΌλŠ” ν•¨μˆ˜λΌκ³  ν•˜μž.

\[\mathbf x \in \mathcal{R_{i}} \Leftrightarrow \hat C(\mathbf x) = j\]

κ²°ν•©ν™•λ₯ λΆ„포 \(p(\mathbf x, C_{k})\)κ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ 졜적의 ν•¨μˆ˜ \(\hat C(\mathbf x)\)λ₯Ό μ°ΎλŠ” 것.
β€˜μ΅œμ μ˜ ν•¨μˆ˜β€™λŠ” μ–΄λ–€ κΈ°μ€€μœΌλ‘œ?

Minimizing the Expected Loss

μ•žμ—μ„œ 였λ₯˜λ₯Ό μ΅œμ†Œν™” ν•œλ‹€κ³  ν–ˆμ§€λ§Œ 쑰금 더 ν™•μž₯ν•œλ‹€λ©΄ κΈ°λŒ“κ°’μœΌλ‘œ 갈 수 μžˆμ„ 것이닀.

λͺ¨λ“  결정이 λ™μΌν•œ 리슀크λ₯Ό κ°–λŠ” 것이 μ•„λ‹˜

  • 암이 μ•„λ‹Œλ° μ•”μœΌλ‘œ 진단
  • 암인데 암이 μ•„λ‹Œ κ²ƒμœΌλ‘œ 진단 (risky)

손싀행렬 (Loss Matrix)

  • \(L_{kj}\): \(C_{k}\)에 μ†ν•˜λŠ” \(\mathbf x\)λ₯Ό \(C_{j}\)둜 λΆ„λ₯˜ν•  λ•Œ λ°œμƒν•˜λŠ” 손싀
    lambda

행은 μ‹€μ œ 클래슀, 열은 λΆ„λ₯˜ν•œ μ˜ˆμΈ‘κ°’μ΄λ‹€.

데이터에 λŒ€ν•œ λͺ¨λ“  μ •λ³΄λŠ” ν™•λ₯ λΆ„ν¬λ‘œ ν‘œν˜„λ˜κ³  μžˆμŒμ„ κΈ°μ–΅ν•΄μ•Ό ν•œλ‹€. μš°λ¦¬κ°€ κ΄€μ°°ν•  수 μžˆλŠ” μƒ˜ν”Œμ€ ν™•λ₯  뢄포λ₯Ό ν†΅ν•΄μ„œ μƒμ„±λœ 것이라고 κ°„μ£Όν•œλ‹€.

λ”°λΌμ„œ 손싀행렬 \(L\)이 μ£Όμ–΄μ‘Œμ„ λ•Œ, λ‹€μŒκ³Ό 같은 κΈ°λŒ€μ†μ‹€μ„ μ΅œμ†Œν™” ν•˜λŠ” 것을 λͺ©ν‘œλ‘œ ν•  수 μžˆλ‹€.

\[\mathbb E[L] = \sum_{k}\sum_{j}\int_{\mathcal{R_{i}}}\ L_{kj}p(\mathbf x,C_{k})d\mathbf x\]

κΈ°λŒ€μ†μ‹€ μ΅œμ†Œν™”

\(\mathbb E[L] = \sum_{k}\sum_{j}\int_{\mathcal{R_{i}}}\ L_{kj}p(\mathbf x,C_{k})d\mathbf x\)

\(\hat C(\mathbf x)\)λ₯Ό \(\mathbf x\)κ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ μ˜ˆμΈ‘κ°’ \((1,...,K 쀑 ν•˜λ‚˜μ˜ κ°’)\)을 λŒλ €μ£ΌλŠ” ν•¨μˆ˜

\[\mathbf x \in \mathcal{R_{i}} \Leftrightarrow \hat C(\mathbf x) = j\]

λ”°λΌμ„œ μœ„μ˜ \(\mathbb E[L]\) 식을 μ•„λž˜μ™€ 같이 ν‘œν˜„ν•  수 μžˆλ‹€.

  • μœ„μ˜ κΈ°λŒ€μ†μ‹€μ‹μ—μ„œλŠ” \(L_{kj}\) λŒ€μ‹ μ— \(\hat C(\mathbf x)\)둜 λ°”κΎΈκ³ 
  • κ³±μ…ˆλ²•μΉ™μ„ μ΄μš©ν•΄, κ²°ν•©ν™•λ₯ (\(p(\mathbf x, C_{k})\))을 쑰건뢀확λ₯ (\(p(C_{k}\mid \mathbf x)\))κ³Ό marginal prob(\(p(\mathbf x)\))둜 λ°”κΏ”μ£Όμ—ˆλ‹€.
\[\begin{align} \int_{x}\sum_{k=1}^{K}L_{k\hat C(\mathbf x)}p(\mathbf x, C_{k})d\mathbf x \\\\ = \int_{x}\left( \sum_{k=1}^{K}L_{k\hat C(\mathbf x)}p(C_{k}\mid \mathbf x) \right)p(\mathbf x)d\mathbf x \\\\ \end{align}\]

μ΄λ ‡κ²Œ ν‘œν˜„λœ \(\mathbb E[L]\)λŠ” \(\hat C(\mathbf x)\)의 λ²”ν•¨μˆ˜μ΄κ³  이 λ²”ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™”μ‹œν‚€λŠ” ν•¨μˆ˜ \(\hat C(\mathbf x)\)λ₯Ό 찾으면 λœλ‹€.

μˆ˜λ§Žμ€ λ§μ…ˆμ„ μ΅œμ†Œν™” μ‹œν‚¨λ‹€κ³  생각해보면,
\(p(\mathbf x)\) > 0 이기 λ•Œλ¬Έμ—
각각의 x에 λŒ€ν•΄μ„œ \(\sum_{k=1}^{K}L_{k\hat C(\mathbf x)}p(C_{k}\mid \mathbf x)\) 이 λΆ€λΆ„λ§Œ μ΅œμ†Œν™” μ‹œν‚€κ²Œ 되면
μ „μ²΄μ˜ 합이 μ΅œμ†Œν™” 될 것이닀.

λ²”ν•¨μˆ˜: ν”νžˆ ν•¨μˆ˜λ₯Ό 상상할 λ•Œ μˆ«μžκ°€ μž…λ ₯λ˜μ—ˆμ„ λ•Œ 숫자λ₯Ό 좜λ ₯μ‹œν‚€λŠ” κ°€μƒμ˜ μƒμžλΌ μƒκ°ν•œλ‹€. 이와 λΉ„μŠ·ν•˜κ²Œ λ²”ν•¨μˆ˜λ₯Ό μ–΄λ–€ μƒμžλ‘œ μƒμƒν•œλ‹€λ©΄, 숫자 λŒ€μ‹  ν•¨μˆ˜κ°€ μž…λ ₯되고 그에 λŒ€ν•œ 결과둜 μˆ«μžκ°€ 좜λ ₯λ˜λŠ” μƒμžλΌκ³  ν•  수 μžˆλ‹€.

  • 숫자 -> ν•¨μˆ˜ f -> 숫자
  • ν•¨μˆ˜ -> λ²”ν•¨μˆ˜ f -> 숫자
  • ν•¨μˆ˜μ˜ ν•¨μˆ˜λΌκ³  생각할 수 μžˆλ‹€.
  • 즉, \(\mathbb E[L]\)은 \(\hat C(\mathbf x)\)에 따라 값이 λ³€ν•˜κΈ° λ•Œλ¬Έ.
\[\hat C(\mathbf x) = \arg\min_{j}\sum_{k=1}^{K}\ L_{kj}p(C_{k}\mid\mathbf x)\]

: κ°€λŠ₯ν•œ \(j\)λ₯Ό λͺ¨λ‘ μ‹œλ„λ₯Ό ν–ˆμ„ λ•Œ \(\sum_{k=1}^{K}\ L_{kj}p(C_{k}\mid\mathbf x)\)이 μ΅œμ†Œκ°€ λ˜λŠ” j

λ§Œμ•½μ— 손싀행렬이 0-1 loss인 경우 (μ£ΌλŒ€κ°μ„  μ›μ†Œλ“€μ€ 0 λ‚˜λ¨Έμ§€λŠ” 1)
lambda

이 κ²½μš°μ— \(L_{kj}=1\), \(L_{jj}=0\) μ΄λΌλŠ” 사싀을 ν™œμš©ν•΄μ„œ,

\[\hat C(\mathbf x) = \arg\min_{j}\sum_{k=1}^{K}\ L_{kj}p(C_{k}\mid\mathbf x)\] \[= \sum_{k=1}^{K}\ p(C_{k}\mid\mathbf x) - p(C_{j}\mid \mathbf x)\] \[= 1 - p(C_{j}\mid \mathbf x)\]

λ”°λΌμ„œ \(1 - p(C_{j}\mid \mathbf x)\) 이 값을 μ΅œμ†Œν™”ν•˜λŠ” 것은 κ²°κ΅­ \(p(C_{j}\mid \mathbf x)\) 이 값을 μ΅œλŒ€ν™” ν•˜λŠ” 것이닀.

κ²°λ‘ :

\[\begin{align} \hat C(\mathbf x) &= \arg\min_{j} 1 - p(C_{j}\mid\mathbf x) \\\\ &= \arg\max_{j} p(C_{j}\mid\mathbf x) \end{align}\]
  • μœ„ μ‹μ—μ„œ \(= \sum_{k=1}^{K}\ p(C_{k}\mid\mathbf x) - p(C_{j}\mid \mathbf x)\) 이 뢀뢄이 잘 이해가 μ•ˆλœλ‹€. λŒ€κ°μ΄ 0인건 μ•Œκ² μœΌλ‚˜ μ–΄λ–»κ²Œ μ΄λ ‡κ²Œ λ³€ν™˜λ˜λŠ”μ§€ λͺ¨λ₯΄κ² λ‹€.

예제: μ˜λ£Œμ§„λ‹¨

\(C_{k} \in \{1,2\} \Leftrightarrow \{sick, healthy\}\)

\[L = \begin{bmatrix}0 & 100\\1 & 0\end{bmatrix}\]

L[1,1] = 0: sick & diagnosed as sick
L[1,2] = 100: sick & diagnosed as healthy

이 경우 κΈ°λŒ€μ†μ‹€(expected loss):

\[\begin{align} \mathbb E[L] &= \int_{\mathcal{R_{2}}}L_{1,2}p(\mathbf x, C_{1})d\mathbf x + \int_{\mathcal{R_{1}}}L_{2,1}p(\mathbf x, C_{2})d\mathbf x \\\\ &= \int_{\mathcal{R_{2}}}100\times p(\mathbf x, C_{1})d\mathbf x + \int_{\mathcal{R_{1}}}p(\mathbf x, C_{2})d\mathbf x \\\\ \end{align}\]

ν–‰ = Groundtruth
μ—΄ = Prediction(Diagnosis)

\(\int_{\mathcal{R_{2}}}L_{1,2}p(\mathbf x, C_{1})d\mathbf x\): predicted as healthy(\(\mathcal{R_{2}}\)), but groundtruth is sick(\(C_{1}\))
since \(L_{1,2}\) = 100, \(\ \int_{\mathcal{R_{2}}}L_{1,2}p(\mathbf x, C_{1})d\mathbf x = \int_{\mathcal{R_{2}}}100\times\ p(\mathbf x, C_{1})d\mathbf x\).

from this j = 1:
\(\hat C(\mathbf x) = \arg\min_{j}\sum_{k=1}^{K}\ L_{kj}p(C_{k}\mid\mathbf x)\)

\[L_{11} = 0, L_{21} = 1\] \[\begin{align} \sum_{k=1}^{K}\ L_{k,1}p(C_{k}\mid\mathbf x) & = L_{11}p(C_{1}\mid \mathbf x) + L_{21}p(C_{2}\mid \mathbf x)\\ &= p(C_{2}\mid \mathbf x) \\\\ \end{align}\]

j = 2:

\[\begin{align} \sum_{k=1}^{K}\ L_{k,2}p(C_{k}\mid\mathbf x) & = L_{12}p(C_{1}\mid \mathbf x) + L_{22}p(C_{2}\mid \mathbf x)\\ &= 100\times\ p(C_{1}\mid \mathbf x) \\\\ \end{align}\]

thus:
\(p(C_{2}\mid \mathbf x), 100\times\ p(C_{1}\mid \mathbf x)\)

κ±΄κ°•ν•˜λ‹€(\(C_{2}\))κ³  νŒλ‹¨ν•˜κΈ° μœ„ν•œ 쑰건은:
\(p(C_{2}\mid \mathbf x)> 100\times\ p(C_{1}\mid \mathbf x)\)
sick의 ν™•λ₯ λ³΄λ‹€ 100크게 λ‚˜μ™€μ•Ό ν•œλ‹€.

μ•ˆμ „ν•˜κ²Œ 진단을 ν•˜κΈ° μœ„ν•΄μ„œ(μ˜€μ§„λ‹¨μ˜ 리슀크λ₯Ό 쀄이기 μœ„ν•΄), 손싀행렬을 λͺ¨λΈμ•ˆμ— ν¬ν•¨μ‹œμΌœμ„œ 결정을 λ‚΄λ¦¬λŠ” 것이 쒋을 것이닀.

Regression

λͺ©ν‘œκ°’ \(t \in \mathcal{R}\)

μ†μ‹€ν•¨μˆ˜: \(L(t,y(\mathbf x)) = \{y(\mathbf x)-t\}^{2}\)

μ†μ‹€κ°’μ˜ κΈ°λŒ“κ°’μΈ \(E[L]\)λ₯Ό μ΅œμ†Œν™”μ‹œν‚€λŠ” ν•¨μˆ˜ \(y(\mathbf x)\)λ₯Ό κ΅¬ν•˜λŠ” 것이 λͺ©ν‘œ.

\[\begin{align} F[y] = E[L] &= \int_{\mathcal{R}}\int_{\mathcal{X}}\{ y(\mathbf x) - t \}^{2}p(\mathbf x, t)d\mathbf x dt \\ &= \int_{\mathcal{X}}\left( \int_{\mathcal{R}}\{ y(\mathbf x) - t \}^{2}p(\mathbf x, t)dt \right)d\mathbf x \\ &= \int_{\mathcal{X}}\left( \int_{\mathcal{R}}\{ y(\mathbf x) - t \}^{2}p(t\mid \mathbf x)dt \right)p(\mathbf x)d\mathbf x \\ \end{align}\]

κ²°λ‘ :
\(\mathbf x\)λ₯Ό μœ„ν•œ 졜적의 μ˜ˆμΈ‘κ°’μ€ \(y(\mathbf x) = \mathbb E_{t}[t\mid x]\)μž„μ„ 보일 것이닀.

\(\mathbb E_{t}[t\mid x]\): xκ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ t의 κΈ°λŒ“κ°’.

lambda

μœ„ κ·Έλ¦Όμ—μ„œ μš°λ¦¬κ°€ μ•Œκ³  μžˆλŠ” 것은 \(x_{0}\)κ°€ μ£Όμ–΄μ‘Œμ„ λ•Œ t의 쑰건뢀확λ₯  \(p(t\mid x_{0})\) 이고, μ΄κ²ƒμ˜ κΈ°λŒ“κ°’μ€ \(y(x_{0})\)이닀.

Methods for Decision Problems

Classification

ν™•λ₯ λͺ¨λΈμ— μ˜μ‘΄ν•˜λŠ” 경우

  • 생성λͺ¨λΈ(generative model): λ¨Όμ € 각 클래슀 \(C_{k}\)에 λŒ€ν•΄ 뢄포 \(p(\mathbf x\mid C_{k})\)와 사전확λ₯  \(p(C_{k})\)λ₯Ό κ΅¬ν•œ λ‹€μŒ 베이즈 정리λ₯Ό μ‚¬μš©ν•΄μ„œ 사후확λ₯  \(p(C_{k}\mid \mathbf x)\)λ₯Ό κ΅¬ν•œλ‹€.
\[p(C_{k}\mid \mathbf x) = \frac{p(\mathbf x\mid C_{k})p(C_{k})}{p(\mathbf x)}\]

\(p(\mathbf x)\)λŠ” λ‹€μŒκ³Ό 같이 ꡬ할 수 μžˆλ‹€.

\[p(\mathbf x) = \sum_{k}p(\mathbf x\mid C_{k})p(C_{k})\]

사후확λ₯ μ΄ μ£Όμ–΄μ‘ŒκΈ° λ•Œλ¬Έμ— λΆ„λ₯˜λ₯Ό μœ„ν•œ 결정은 μ‰½κ²Œ μ΄λ£¨μ–΄μ§ˆ 수 μžˆλ‹€. κ²°ν•©λΆ„ν¬μ—μ„œ 데이터λ₯Ό μƒ˜ν”Œλ§ν•΄μ„œ β€˜μƒμ„±β€™ν•  수 μžˆμœΌλ―€λ‘œ 이런 방식을 생성λͺ¨λΈμ΄λΌκ³  λΆ€λ₯Έλ‹€.

  • 식별λͺ¨λΈ(discriminative model): λͺ¨λ“  뢄포λ₯Ό λ‹€ κ³„μ‚°ν•˜μ§€ μ•Šκ³  였직 사후확λ₯  \(p(C_{k}\mid \mathbf x)\)λ₯Ό κ΅¬ν•œλ‹€. μœ„μ™€ λ™μΌν•˜κ²Œ 결정이둠을 μ μš©ν•  수 μžˆλ‹€.

νŒλ³„ν•¨μˆ˜μ— μ˜μ‘΄ν•˜λŠ” 경우

ν™•λ₯ λͺ¨λΈμ— μ˜μ‘΄ν•˜μ§€ μ•ŠλŠ” λͺ¨λΈ

  • νŒλ³„ν•¨μˆ˜(discriminant function): μž…λ ₯ \(\mathbf x\)을 클래슀둜 ν• λ‹Ήν•˜λŠ” νŒλ³„ν•¨μˆ˜(discriminant function)을 μ°ΎλŠ”λ‹€. ν™•λ₯ κ°’은 κ³„μ‚°ν•˜μ§€ μ•ŠλŠ”λ‹€.

Regression

  • 결합뢄포\(p(\mathbf x, t)\)λ₯Ό κ΅¬ν•˜λŠ” μΆ”λ‘ (inference)문제λ₯Ό λ¨Όμ € ν‘Ό λ‹€μŒ 쑰건뢀확λ₯ λΆ„포 \(p(t\mid \mathbf x)\)λ₯Ό κ΅¬ν•œλ‹€. 그리고 μ£Όλ³€ν™”(marginalize)λ₯Ό 톡해 \(\mathbb E_{t}[t\mid x]\)λ₯Ό κ΅¬ν•œλ‹€.
  • 쑰건뢀확λ₯ λΆ„포 \(p(t\mid \mathbf x)\)λ₯Ό κ΅¬ν•˜λŠ” μΆ”λ‘ λ¬Έμ œλ₯Ό ν‘Ό λ‹€μŒ μ£Όλ³€ν™”(marginalize)λ₯Ό 톡해 \(\mathbb E_{t}[t\mid x]\)λ₯Ό κ΅¬ν•œλ‹€.
  • \(y(\mathbf x)\)λ₯Ό μ§μ ‘μ μœΌλ‘œ κ΅¬ν•œλ‹€.

Optional

Euler-Lagrange Equation

μ†μ‹€ν•¨μˆ˜μ˜ λΆ„ν•΄

Appendix

MathJax

\(\mathbb E\):

$$\mathbb E$$

\(\mathcal{R}\):

$$\mathcal{R}$$

\(\arg\min_{j}\):

$$\arg\min_{j}$$

matrix with bracket: \(L = \begin{bmatrix}a & b\\c & d\end{bmatrix}\)

$$L = \begin{bmatrix}a & b\\c & d\end{bmatrix}$$  

matrix with curly braces:
\(\begin{Bmatrix}aaa & b\cr c & ddd \end{Bmatrix}\)

$$\begin{Bmatrix}aaa & b\cr c   & ddd \end{Bmatrix}$$

κ°€λ³€ κ΄„ν˜Έ with escape curly brackets
\(\left\{-\frac{1}{2\sigma^{2}} \sum_{n=1}^{N}(x_{n}-\mu)^{2} \right\}\):

$$\left\{-\frac{1}{2\sigma^{2}} \sum_{n=1}^{N}(x_{n}-\mu)^{2} \right\}$$ 

References

Drawing Graph with PPT: https://www.youtube.com/watch?v=MQEBu9NnCuI
Decision Theory: http://norman3.github.io/prml/docs/chapter01/5.html
Pattern Recognition and Machine Learning: https://tensorflowkorea.files.wordpress.com/2018/11/bishop-pattern-recognition-and-machine-learning-2006.pdf

Leave a comment