来源:微信公众号「张小珺访谈|36篇经典论文演义:探索 AI 发展的所有论文!」 整理时间:2026-06-10
01. AlexNet — ImageNet Classification with Deep Convolutional Neural Networks (2012)
- 论文链接: https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
- 发表: NeurIPS (NIPS) 2012
- PDF下载:NeurIPS 2012.pdf
02. VGGNet — Very Deep Convolutional Networks for Large-Scale Image Recognition (2014)
- 论文链接: https://arxiv.org/abs/1409.1556
- 发表: ICLR 2015
- arXiv编号:
arXiv:1409.1556 - PDF下载:ICLR 2015.pdf,arXiv:1409.1556
03. GoogLeNet/Inception — Going Deeper with Convolutions (2014)
- 论文链接: https://arxiv.org/abs/1409.4842
- 发表: CVPR 2015
- arXiv编号:
arXiv:1409.4842 - PDF下载:CVPR 2015.pdf,arXiv:1409.4842
04. ResNet — Deep Residual Learning for Image Recognition (2015)
- 论文链接: https://arxiv.org/abs/1512.03385
- 发表: CVPR 2016
- arXiv编号:
arXiv:1512.03385 - PDF下载:CVPR 2016.pdf,arXiv:1512.03385
05. U-Net — Convolutional Networks for Biomedical Image Segmentation (2015)
- 论文链接: https://arxiv.org/abs/1505.04597
- 发表: MICCAI 2015
- arXiv编号:
arXiv:1505.04597 - PDF下载:MICCAI 2015 (Springer),arXiv:1505.04597
06. Batch Normalization — Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015)
- 论文链接: https://arxiv.org/abs/1502.03167
- 发表: ICML 2015
- arXiv编号:
arXiv:1502.03167 - PDF下载:ICML 2015.pdf,arXiv:1502.03167
07. Faster R-CNN — Towards Real-Time Object Detection with Region Proposal Networks (2015)
- 论文链接: https://arxiv.org/abs/1506.01497
- 发表: NeurIPS 2015
- arXiv编号:
arXiv:1506.01497 - PDF下载:NeurIPS 2015.pdf,arXiv:1506.01497
08. YOLO — You Only Look Once: Unified, Real-Time Object Detection (2016)
- 论文链接: https://arxiv.org/abs/1506.02640
- 发表: CVPR 2016
- arXiv编号:
arXiv:1506.02640 - PDF下载:CVPR 2016.pdf,arXiv:1506.02640
09. MobileNet — Efficient Convolutional Neural Networks for Mobile Vision Applications (2017)
- 论文链接: https://arxiv.org/abs/1704.04861
- 发表: arXiv 预印本 (2017) — MobileNetV2 后续发表于 CVPR 2018
- arXiv编号:
arXiv:1704.04861 - PDF下载:arXiv:1704.04861
10. EfficientNet — Rethinking Model Scaling for Convolutional Neural Networks (2019)
- 论文链接: https://arxiv.org/abs/1905.11946
- 发表: ICML 2019
- arXiv编号:
arXiv:1905.11946 - PDF下载:ICML 2019.pdf,arXiv:1905.11946
11. Transformer — Attention Is All You Need (2017)
- 论文链接: https://arxiv.org/abs/1706.03762
- 发表: NeurIPS 2017
- arXiv编号:
arXiv:1706.03762 - PDF下载:NeurIPS 2017.pdf,arXiv:1706.03762
12. BERT — Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
- 论文链接: https://arxiv.org/abs/1810.04805
- 发表: NAACL-HLT 2019
- arXiv编号:
arXiv:1810.04805 - PDF下载:NAACL 2019.pdf,arXiv:1810.04805
13. GPT-1 — Improving Language Understanding by Generative Pre-Training (2018)
- 论文链接: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
- 发表: OpenAI 技术报告 (未正式发表)
- PDF下载:OpenAI Technical Report
14. GPT-2 — Language Models are Unsupervised Multitask Learners (2019)
- 论文链接: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- 发表: OpenAI 技术报告 (未正式发表)
- PDF下载:OpenAI Technical Report
15. RoBERTa — A Robustly Optimized BERT Pretraining Approach (2019)
- 论文链接: https://arxiv.org/abs/1907.11692
- 发表: arXiv 预印本 (2019) — 后续工作被多次引用,但未在顶级会议正式发表
- arXiv编号:
arXiv:1907.11692 - PDF下载:arXiv:1907.11692
16. T5 — Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (2019)
- 论文链接: https://arxiv.org/abs/1910.10683
- 发表: JMLR 2020
- arXiv编号:
arXiv:1910.10683 - PDF下载:JMLR 2020.pdf,arXiv:1910.10683
17. GPT-3 — Language Models are Few-Shot Learners (2020)
- 论文链接: https://arxiv.org/abs/2005.14165
- 发表: NeurIPS 2020
- arXiv编号:
arXiv:2005.14165 - PDF下载:NeurIPS 2020.pdf,arXiv:2005.14165
18. CLIP — Learning Transferable Visual Models From Natural Language Supervision (2021)
- 论文链接: https://arxiv.org/abs/2103.00020
- 发表: ICML 2021
- arXiv编号:
arXiv:2103.00020 - PDF下载:ICML 2021.pdf,arXiv:2103.00020
19. DALL-E — Zero-Shot Text-to-Image Generation (2021)
- 论文链接: https://arxiv.org/abs/2102.12092
- 发表: ICML 2021
- arXiv编号:
arXiv:2102.12092 - PDF下载:ICML 2021.pdf,arXiv:2102.12092
20. BLIP — Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (2022)
- 论文链接: https://arxiv.org/abs/2201.12086
- 发表: ICML 2022
- arXiv编号:
arXiv:2201.12086 - PDF下载:ICML 2022.pdf,arXiv:2201.12086
21. Flamingo — A Visual Language Model for Few-Shot Learning (2022)
- 论文链接: https://arxiv.org/abs/2204.14198
- 发表: NeurIPS 2022
- arXiv编号:
arXiv:2204.14198 - PDF下载:NeurIPS 2022.pdf,arXiv:2204.14198
22. LLaVA — Visual Instruction Tuning (2023)
- 论文链接: https://arxiv.org/abs/2304.08485
- 发表: arXiv 预印本 (2023) — 后续 LLaVA-1.5 工作被 CVPR 2024 接收
- arXiv编号:
arXiv:2304.08485 - PDF下载:arXiv:2304.08485
23. GPT-4V — The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) (2023)
- 论文链接: https://arxiv.org/abs/2309.17421
- 发表: arXiv 预印本 (2023) — 微软研究院技术报告
- arXiv编号:
arXiv:2309.17421 - PDF下载:arXiv:2309.17421
24. GAN — Generative Adversarial Networks (2014)
- 论文链接: https://arxiv.org/abs/1406.2661
- 发表: NeurIPS 2014
- arXiv编号:
arXiv:1406.2661 - PDF下载:NeurIPS 2014.pdf,arXiv:1406.2661
25. VAE — Auto-Encoding Variational Bayes (2013)
- 论文链接: https://arxiv.org/abs/1312.6114
- 发表: ICLR 2014
- arXiv编号:
arXiv:1312.6114 - PDF下载:ICLR 2014.pdf,arXiv:1312.6114
26. StyleGAN — A Style-Based Generator Architecture for Generative Adversarial Networks (2019)
- 论文链接: https://arxiv.org/abs/1812.04948
- 发表: CVPR 2019
- arXiv编号:
arXiv:1812.04948 - PDF下载:CVPR 2019.pdf,arXiv:1812.04948
27. DDPM — Denoising Diffusion Probabilistic Models (2020)
- 论文链接: https://arxiv.org/abs/2006.11239
- 发表: NeurIPS 2020
- arXiv编号:
arXiv:2006.11239 - PDF下载:NeurIPS 2020.pdf,arXiv:2006.11239
28. Stable Diffusion — High-Resolution Image Synthesis with Latent Diffusion Models (2022)
- 论文链接: https://arxiv.org/abs/2112.10752
- 发表: CVPR 2022
- arXiv编号:
arXiv:2112.10752 - PDF下载:CVPR 2022.pdf,arXiv:2112.10752
29. DiT — Scalable Diffusion Models with Transformers (2023)
- 论文链接: https://arxiv.org/abs/2212.09748
- 发表: ICLR 2023
- arXiv编号:
arXiv:2212.09748 - PDF下载:ICLR 2023.pdf,arXiv:2212.09748
30. ViT — An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2020)
- 论文链接: https://arxiv.org/abs/2010.11929
- 发表: ICLR 2021
- arXiv编号:
arXiv:2010.11929 - PDF下载:ICLR 2021.pdf,arXiv:2010.11929
31. MAE — Masked Autoencoders Are Scalable Vision Learners (2021)
- 论文链接: https://arxiv.org/abs/2111.06377
- 发表: CVPR 2022
- arXiv编号:
arXiv:2111.06377 - PDF下载:CVPR 2022.pdf,arXiv:2111.06377
32. SimCLR — A Simple Framework for Contrastive Learning of Visual Representations (2020)
- 论文链接: https://arxiv.org/abs/2002.05709
- 发表: ICML 2020
- arXiv编号:
arXiv:2002.05709 - PDF下载:ICML 2020.pdf,arXiv:2002.05709
33. MoCo — Momentum Contrast for Unsupervised Visual Representation Learning (2019)
- 论文链接: https://arxiv.org/abs/1911.05722
- 发表: CVPR 2020
- arXiv编号:
arXiv:1911.05722 - PDF下载:CVPR 2020.pdf,arXiv:1911.05722
34. DQN — Playing Atari with Deep Reinforcement Learning (2013)
- 论文链接: https://arxiv.org/abs/1312.5602
- 发表: NeurIPS 2013 Deep Learning Workshop — 扩展版发表于 Nature 2015: "Human-level control through deep reinforcement learning"
- arXiv编号:
arXiv:1312.5602 - PDF下载:Nature 2015.pdf,arXiv:1312.5602
35. AlphaGo — Mastering the Game of Go with Deep Neural Networks and Tree Search (2016)
- 论文链接: https://www.nature.com/articles/nature16961
- 发表: Nature 529, 484–489 (2016)
- PDF下载:Nature 2016.pdf
36. PPO — Proximal Policy Optimization Algorithms (2017)
- 论文链接: https://arxiv.org/abs/1707.06347
- 发表: arXiv 预印本 (2017) — OpenAI 技术报告,被广泛引用但未在顶级会议正式发表
- arXiv编号:
arXiv:1707.06347 - PDF下载:arXiv:1707.06347