RepLKNet-pytorch (CVPR 2022)

This is the official PyTorch implementation of RepLKNet, from the following CVPR-2022 paper:

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs.

The paper is released on arXiv: https://arxiv.org/abs/2203.06717.

Update: training code released. testing

Other implementations

More re-implementations are welcomed.

Use our efficient large-kernel convolution with PyTorch

We have released an example for PyTorch. Please check setup.py and depthwise_conv2d_implicit_gemm.py (a replacement of torch.nn.Conv2d) in https://github.com/MegEngine/cutlass/tree/master/examples/19_large_depthwise_conv2d_torch_extension.

Clone cutlass (https://github.com/MegEngine/cutlass), enter the directory.
cd examples/19_large_depthwise_conv2d_torch_extension
./setup.py install --user. If you get errors, check your CUDA_HOME.
A quick check: python depthwise_conv2d_implicit_gemm.py
Add WHERE_YOU_CLONED_CUTLASS/examples/19_large_depthwise_conv2d_torch_extension into your PYTHONPATH so that you can from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM anywhere. Then you may use DepthWiseConv2dImplicitGEMM as a replacement of nn.Conv2d.
export LARGE_KERNEL_CONV_IMPL=WHERE_YOU_CLONED_CUTLASS/examples/19_large_depthwise_conv2d_torch_extension so that RepLKNet will use the efficient implementation. Or you may simply modify the related code (get_conv2d) in replknet.py.

Our implementation mentioned in the paper has been integrated into MegEngine. The engine will automatically use it. If you would like to use it in other frameworks like Tensorflow, you may need to compile our released cuda sources (the *.cu files in the above example should work with other frameworks) and use some tools to load them, just like cutlass and torch.utils.cpp_extension in the PyTorch example. Would be appreciated if you could share with us your experience.

You may refer to the MegEngine source code: https://github.com/MegEngine/MegEngine/tree/8a2e92bd6c5ac02807b27d174dce090ee391000b/dnn/src/cuda/conv_bias/chanwise. .

Pull requests (e.g., better or other implementations or implementations on other frameworks) are welcomed.

Catalog

Model code
PyTorch pretrained models
PyTorch large-kernel conv impl
PyTorch training code
PyTorch downstream models
PyTorch downstream code

Results and Pre-trained Models

ImageNet-1K Models

name resolution ImageNet-1K acc #params FLOPs ImageNet-1K pretrained model

RepLKNet-31B 224x224 83.5 79M 15.3G Google Drive, Baidu

RepLKNet-31B 384x384 84.8 79M 45.1G Google Drive, Baidu

ImageNet-22K Models

name resolution ImageNet-1K acc #params FLOPs 22K pretrained model 1K finetuned model

RepLKNet-31B 224x224 85.2 79M 15.3G Google Drive, Baidu Google Drive, Baidu

RepLKNet-31B 384x384 86.0 79M 45.1G - Google Drive, Baidu

RepLKNet-31L 384x384 86.6 172M 96.0G Google Drive, Baidu Google Drive, Baidu

MegData-73M Models

(uploading)

name resolution ImageNet-1K acc #params FLOPs MegData-73M pretrained model 1K finetuned model

RepLKNet-XL 320x320 87.8 335M 128.7G

Evaluation

Training

You may use multi-node training on a SLURM cluster with submitit. Please install:

pip install submitit

If you have limited GPU memory (e.g., 2080Ti), use --use_checkpoint True to save GPU memory.

Pretrain RepLKNet-31B on ImageNet-1K

Single machine:

python -m torch.distributed.launch --nproc_per_node=8 main.py --model RepLKNet-31B --drop_path 0.5 --batch_size 64 --lr 4e-3 --update_freq 4 --model_ema true --model_ema_eval true --data_path /path/to/imagenet-1k --warmup_epochs 10 --epochs 300 --use_checkpoint True --output_dir your_training_dir

Four machines:

python run_with_submitit.py --nodes 4 --ngpus 8 --model RepLKNet-31B --drop_path 0.5 --batch_size 64 --lr 4e-3 --update_freq 4 --model_ema true --model_ema_eval true --data_path /path/to/imagenet-1k --warmup_epochs 10 --epochs 300 --use_checkpoint True --job_dir your_training_dir

Finetune the ImageNet-1K-pretrained (224x224) RepLKNet-31B with 384x384

Single machine:

Pretrain RepLKNet-31B on ImageNet-22K

Finetune 22K-pretrained RepLKNet-31B on ImageNet-1K (224x224)

Finetune 22K-pretrained RepLKNet-31B on ImageNet-1K (384x384)

Pretrain RepLKNet-31L on ImageNet-22K

Finetune 22K-pretrained RepLKNet-31L on ImageNet-1K (224x224)

Finetune 22K-pretrained RepLKNet-31L on ImageNet-1K (384x384)

Acknowledgement

The released PyTorch training script is based on the code of ConvNeXt, which was built using the timm library, DeiT and BEiT repositories.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

GitHub - DingXiaoH/RepLKNet-pytorch

RepLKNet-pytorch (CVPR 2022)

Other implementations

Use our efficient large-kernel convolution with PyTorch

Catalog

Results and Pre-trained Models

ImageNet-1K Models

ImageNet-22K Models

MegData-73M Models

Evaluation

Training

Pretrain RepLKNet-31B on ImageNet-1K

Finetune the ImageNet-1K-pretrained (224x224) RepLKNet-31B with 384x384

Pretrain RepLKNet-31B on ImageNet-22K

Finetune 22K-pretrained RepLKNet-31B on ImageNet-1K (224x224)

Finetune 22K-pretrained RepLKNet-31B on ImageNet-1K (384x384)

Pretrain RepLKNet-31L on ImageNet-22K

Finetune 22K-pretrained RepLKNet-31L on ImageNet-1K (224x224)

Finetune 22K-pretrained RepLKNet-31L on ImageNet-1K (384x384)

Acknowledgement

License

Recommend

Apple stock on the dip is a great buy - investment strategist

一路火花带闪电国产大飞机C919屁股着地？

华鑫证券再度回应传闻：未发现公司客户存在反洗钱被调查情况公司未收到反洗钱文件

Telegram is banned in Brazil because it didn't check its email | TechSpot

威海海事局发布航行警告：黄海部分海域将进行实弹射击

不惧严寒！比亚迪向丹麦出售多辆电动巴士：和特斯拉错位竞争

深圳：基本实现社会面动态清零有序恢复社会生产生活秩序

乌克兰副总理：20日开通7条人道主义救援通道

on-device authentication

光峰科技：拟回购1000万元-2000万元公司股份

About Joyk