06-24 Live Session

June 24, 2021 2 minute read

CS & Calculus background

The idea of Domain Generalization is to learn from one or multiple training domains, to extract a domain-agnostic model which can be applied to an unseen domain.

Self-Supervised Learning is proposed for utilizing unlabeled data with the success of supervised learning. Producing a dataset with good labels is expensive, while unlabeled data is being generated all the time. The motivation of Self-Supervised Learning is to make use of the large amount of unlabeled data. The main idea of Self-Supervised Learning is to generate the labels from unlabeled data, according to the structure or characteristics of the data itself, and then train on this unsupervised data in a supervised manner. Self-Supervised Learning is wildly used in representation learning to make a model learn the latent features of the data. This technique is often employed in computer vision, video processing and robot control.
https://hoya012.github.io/blog/Self-Supervised-Learning-Overview/
https://paperswithcode.com/task/self-supervised-learning

행렬의 곱은 linear transform.
bias가 들어가면 회전.
비선형(activation function)으로 warp

vector-transform

sgd는 lr에 민감하다.

또한 모델의 용량이 클때 데이터의 양을 더 많이 수집하면 일반화 능력이 향상된다가 잘 이해가 되지 않습니다. 왜 데이터의 개수가 많으면 12차의 모델이 6차처럼 보이는지 잘 이해가 안됩니다ㅜㅜ

파라미터가 많다.

공간의 접히는 부분을 보면 대칭이 되는 선을 기준으로만 접는거 같은데, 접힐때는 항상 그런식으로만 접히게 되는건가요?

계단함수는 활성함수 아닐까요…? 손실함수는 (o-t) L2 or L1 or …로 scalar값으로 알고있습니다.

피쳐맵이 여러개이면, L층이 병렬로 여러개 있다고 보면 되나요?? 층안 노드들이 피쳐맵의 각각의 요소이고 한 층 자체가 피쳐맵 1개라고 이해하고 있는데 이것이 맞는지 궁금합니다.

선형대수와 확률을 기초부터 공부하고 싶은데, 추천하시는 책이나 강의가 있으신가요? 한국어로 되어있으면 좋겠습니다…

단층퍼셉트론에서의 b 는 임계값의 역할을 한다고 강의에서 들었습니다. 그렇다면 다층 퍼셉트론, deep learning 에서의 각각의 b1, b2 등.. 의 값들도 각 퍼셉트론에서의 임계값 역할을 수행하는 것인가요??

ReLU 함수를 이용해서 비선형 곡선 함수를 근사할 수 있는 근거가 뭔지 알고싶습니다

차수는 다항식의 모델을 쓰는 경우에 차수가 많아지면 모델의 파라미터가 많아지는 것.

nips, cvpr 논문을 진행해보는 것도..?