13

GitHub - facebookresearch/mae: PyTorch implementation of MAE https//arxiv.org/ab...

 2 years ago
source link: https://github.com/facebookresearch/mae
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Masked Autoencoders: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners:

@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}
  • The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.

  • This repo is a modification on the DeiT repo. Installation and preparation follow that repo.

  • This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

  • Visualization demo
  • Pre-trained checkpoints + fine-tuning code
  • Pre-training code

Visualization demo

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

ViT-Base ViT-Large ViT-Huge

pre-trained checkpoint download download download

md5 8cad7c b8b06e 9bdbb0

The fine-tuning instruction is in FINETUNE.md.

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

ViT-B ViT-L ViT-H ViT-H448 prev best

ImageNet-1K (no external data) 83.6 85.9 86.9 87.8 87.1

following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):

ImageNet-Corruption (error rate) 51.7 41.8 33.8 36.8 42.5

ImageNet-Adversarial 35.9 57.1 68.2 76.7 35.8

ImageNet-Rendition 48.3 59.9 64.4 66.5 48.7

ImageNet-Sketch 34.5 45.3 49.6 50.9 36.0

following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:

iNaturalists 2017 70.5 75.7 79.3 83.4 75.4

iNaturalists 2018 75.4 80.1 83.0 86.8 81.2

iNaturalists 2019 80.5 83.4 85.7 88.3 84.1

Places205 63.9 65.8 65.9 66.8 66.0

Places365 57.9 59.4 59.8 60.3 58.0

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK