반응형

분류 전체보기 823

[논문 읽기] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision(2021)

Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision https://arxiv.org/abs/2106.01226 Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision In this paper, we study the semi-supervised semantic segmentation problem via exploring both labeled data and extra unlabeled data. We propose a novel consistency regularization approach, called cross pseudo supervision (CPS). Ou..

[논문 읽기] ViLD(2021), Open-Vocabulary Object Detection via Vision and Language Knowledge Distillation

Open-Vocabulary Object Detection via Vision and Language Knowledge Distillation, ViLD https://arxiv.org/abs/2104.13921 Open-vocabulary Object Detection via Vision and Language Knowledge Distillation We aim at advancing open-vocabulary object detection, which detects objects described by arbitrary text inputs. The fundamental challenge is the availability of training data. Existing object detecti..

[논문 읽기] DenseCLIP, Extract Free Dense Labels from CLIP

https://arxiv.org/abs/2112.01071 DenseCLIP: Extract Free Dense Labels from CLIP Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition. Many recent studies leverage the pre-trained CLIP models for image-level classification and manipulation. In this paper, we fu arxiv.org CLIP을 segmentation에 적용한 논문. clip이 학습한 정보를 segmentat..

[논문 읽기] Generalized Category Discovery(2022)

Generalized Category Discovery https://arxiv.org/abs/2201.02609 Generalized Category Discovery In this paper, we consider a highly general image recognition setting wherein, given a labelled and unlabelled set of images, the task is to categorize all images in the unlabelled set. Here, the unlabelled images may come from labelled classes or from nov arxiv.org 새로운 task를 제안한다. training set에 포함되어 있..

[논문 읽기] f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning(2019)

f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning https://arxiv.org/abs/1903.10132 f-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning When labeled training data is scarce, a promising data augmentation approach is to generate visual features of unknown classes using their attributes. To learn the class conditional distribution of CNN features, these models rely on ..

[논문 읽기] Decoupling Zero-Shot Semantic Segmentation(2021)

Decoupling Zero-Shot Semantic Segmentation https://arxiv.org/abs/2112.07910 Decoupling Zero-Shot Semantic Segmentation Zero-shot semantic segmentation (ZS3) aims to segment the novel categories that have not been seen in the training. Existing works formulate ZS3 as a pixel-level zero-shot classification problem, and transfer semantic knowledge from seen classes to unseen arxiv.org MaskFormer와 C..

[논문 읽기] Matcing Networks for One Shot Learning(2017)

Matching Networks for One Shot Learning https://arxiv.org/abs/1606.04080 Matching Networks for One Shot Learning Learning from a few examples remains a key challenge in machine learning. Despite recent advances in important domains such as vision and language, the standard supervised deep learning paradigm does not offer a satisfactory solution for learning new conce arxiv.org episodic training을..

[논문 읽기] MaskFormer, Per-Pixel Classification is Not All You Need for Semantic Segmentation(2021)

Per-Pixel Classification is Not All You Need for Semantic Segmentation https://arxiv.org/abs/2107.06278 Per-Pixel Classification is Not All You Need for Semantic Segmentation Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is s..

[논문 읽기] A Simple Baseline for Zero-Shot Semantic Segmentation with Pre-trained Vision-language Model

A Simple Baseline for Zero-Shot Semantic Segmentation with Pre-trained Vision-language Model https://arxiv.org/pdf/2112.14757.pdf CLIP을 zero-shot semantic segmentation에 적용한 논문. MaskFormer로 binary mask를 생성하고 생성한 mask에 대해 mask classification으로 prediction을 수행한다. classifier의 weight를 CLIP의 pre-trained text representation로 사용. 따라서 unseen으로 zero-shot이 가능하다.

반응형