Cross Entropy

1 minute read

Entropy

Entropy๋Š” self-information์˜ ํ‰๊ท 

What is self-information?

Within the context of information theory, self-information is defined as the amount of information that knowledge about (the outcome of) a certain event, adds to someoneโ€™s overall knowledge. The amount of self-information is expressed in the unit of information: a bit.
https://psychology.wikia.org/wiki/Self-information

์ •๋ณด์ด๋ก ์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” ์ž˜ ์ผ์–ด๋‚˜์ง€ ์•Š๋Š” ์‚ฌ๊ฑด(unlikely event)์€ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์‚ฌ๊ฑด๋ณด๋‹ค ์ •๋ณด๋Ÿ‰์ด ๋งŽ๋‹ค(informative)๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์˜ˆ์ปจ๋Œ€ โ€˜์•„์นจ์— ํ•ด๊ฐ€ ๋œฌ๋‹คโ€™๋Š” ๋ฉ”์„ธ์ง€๋กœ ๋ณด๋‚ผ ํ•„์š”๊ฐ€ ์—†์„ ์ •๋„๋กœ ์ •๋ณด ๊ฐ€์น˜๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ โ€˜์˜ค๋Š˜ ์•„์นจ์— ์ผ์‹์ด ์žˆ์—ˆ๋‹คโ€™๋Š” ๋ฉ”์„ธ์ง€๋Š” ์ •๋ณด๋Ÿ‰ ์ธก๋ฉด์—์„œ ๋งค์šฐ ์ค‘์š”ํ•œ ์‚ฌ๊ฑด์ž…๋‹ˆ๋‹ค. ์ด ์•„์ด๋””์–ด๋ฅผ ๊ณต์‹ํ™”ํ•ด์„œ ํ‘œํ˜„ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์‚ฌ๊ฑด์€ ๋‚ฎ์€ ์ •๋ณด๋Ÿ‰์„ ๊ฐ€์ง„๋‹ค. ๋ฐœ์ƒ์ด ๋ณด์žฅ๋œ ์‚ฌ๊ฑด์€ ๊ทธ ๋‚ด์šฉ์— ์ƒ๊ด€์—†์ด ์ „ํ˜€ ์ •๋ณด๊ฐ€ ์—†๋‹ค๋Š” ๊ฑธ ๋œปํ•œ๋‹ค.
  • ๋œ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์‚ฌ๊ฑด์€ ๋” ๋†’์€ ์ •๋ณด๋Ÿ‰์„ ๊ฐ€์ง„๋‹ค.
  • ๋…๋ฆฝ์‚ฌ๊ฑด(independent event)์€ ์ถ”๊ฐ€์ ์ธ ์ •๋ณด๋Ÿ‰(additive information)์„ ๊ฐ€์ง„๋‹ค. ์˜ˆ์ปจ๋Œ€ ๋™์ „์„ ๋˜์ ธ ์•ž๋ฉด์ด ๋‘๋ฒˆ ๋‚˜์˜ค๋Š” ์‚ฌ๊ฑด์— ๋Œ€ํ•œ ์ •๋ณด๋Ÿ‰์€ ๋™์ „์„ ๋˜์ ธ ์•ž๋ฉด์ด ํ•œ๋ฒˆ ๋‚˜์˜ค๋Š” ์ •๋ณด๋Ÿ‰์˜ ๋‘ ๋ฐฐ์ด๋‹ค.

https://ratsgo.github.io/statistics/2017/09/22/information/

๊ฐ ์‚ฌ๊ฑด์— ๋Œ€ํ•ด์„œ ์ •๋ณด์˜ ์–‘(ํ™•๋ฅ ๋ณ€์ˆ˜)์— ๋Œ€ํ•œ ๊ธฐ๋Œ“๊ฐ’์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

\[i(A) = log_b({1\over P(A)}) = -log_b{P(A)}\]

b: ์ •๋ณด์˜ ๋‹จ์œ„ ๋ณดํ†ต e๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

Entropy:

\[H(X) = \sum_j{P(A_j)i(A_j)} = -\sum{P(A_j)log_2P(A_j)}\]

Cross Entropy

Cross Entropy๋Š” Deep Learning์—์„œ ์ฃผ๋กœ Loss Function์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.
๋‹ค์‹œ ํ™•๋ฅ ์—์„œ์˜ CrossEntropy๋ฅผ ๋ณด์ž.

P์™€ Q๋ผ๋Š” ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ์กด์žฌํ•  ๋•Œ์˜ ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ๋Š” ์–ด๋–ค ์˜๋ฏธ์ธ๊ฐ€.

ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ (P, Q):
Q์˜ ์ƒํ™ฉ์—์„œ์˜ ์ž๊ธฐ์ •๋ณด(A)๊ฐ€ ์‹ค์ œ P ์ƒ์—์„œ P์˜ ํ™•๋ฅ ๋กœ ๋‚˜ํƒ€๋ƒˆ์„ ๋•Œ์˜ ๊ธฐ๋Œ“๊ฐ’.

Cross Entropy๋ฅผ ์‚ฌ์šฉํ•ด์„œ P์™€ Q์˜ ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๋น„๊ตํ•œ๋‹ค.
P๋Š” ground truth๋ผ๊ณ  ๋ณด๋ฉด ๋˜๊ณ , Q๋Š” ์˜ˆ์ธก๊ฐ’์ด๋ผ๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.

๋ณดํ†ต DL์„ ํ• ๋•Œ label์€ ohe๋ฅผ ์•ˆํ•ด๋„ output๊ณผ label์„ ์ž…๋ ฅํ•˜๋ฉด,
pytorch CrossEntropy ํด๋ž˜์Šค์—์„œ ์ž๋™์œผ๋กœ ํ•ด๊ฒฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.


ํŒŒ์ดํ† ์น˜ ๊ธฐ์ค€์œผ๋กœ๋Š” BinaryCrossEntropy, CrossEntropy ํด๋ž˜์Šค๊ฐ€ ์กด์žฌํ•œ๋‹ค.

Pytorch CrossEntropy


ํŒŒ์ดํ† ์น˜ ๊ณต์‹๋ฌธ์„œ์— ๋‚˜์˜จ ์˜ˆ์ œ


import torch

import torch.nn as nn

loss = nn.CrossEntropyLoss()

data = torch.randn(3, 5, requires_grad=True)

target = torch.empty(3, dtype=torch.long).random_(5)

print(data)

print(target)

output = loss(data, target)

print(output)

output.backward()

output

tensor([[ 0.1449, -0.5543, -0.1170, -1.3969, -0.1700],
        [ 0.1715,  1.0151,  0.6917,  1.4723, -0.3305],
        [ 0.7153,  1.7428, -0.7265, -0.5458,  0.1957]], requires_grad=True)

tensor([2, 4, 1])

tensor(1.5741, grad_fn=<NllLossBackward>)

network๋ฅผ ๋Œ๊ณ  ๋‚˜์˜ค๋ฉด class ๊ฐœ์ˆ˜ ๋งŒํผ output์œผ๋กœ ๋‚˜์˜จ๋‹ค.
์˜ˆ์ œ์—์„œ๋Š” target์œผ๋กœ 3๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋งŒ ์ฃผ์–ด์ง„ ๊ฒƒ์ด๋‹ค.
data tensor์— ์žˆ๋Š” ๊ฐ ํ–‰๋งˆ๋‹ค target์˜ ๊ฐ element๊ฐ€ ๋Œ€์‘ํ•œ๋‹ค.
zero indexing์ด๊ธฐ ๋•Œ๋ฌธ์— ์ฒซ ํ–‰์˜ tensor๋Š” 3๋ฒˆ์งธ ๋ฐ์ดํ„ฐ์ธ -0.1170์„ ๊ฐ€์žฅ ์ตœ๋Œ€๋กœ ํ•˜๋Š” ๊ฐ’์œผ๋กœ ์—…๋ฐ์ดํŠธ ๋  ๊ฒƒ์ด๋‹ค.

https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

Leave a comment