Tags - 郑之杰的个人网站

TAGS

机器学习深度学习数学英语 Python 随笔论文阅读游记
「机器学习」类别型特征提升(Categorical Boosting, CatBoost) 轻量级梯度提升机(Light Gradient Boosting Machine, LightGBM) 极限梯度提升(eXtreme Gradient Boosting, XGBoost) 自组织映射神经网络(Self-Organizing Map, SOM) 孤立森林(Isolation Forest, iForest) 局部保留投影(Locality Preserving Projection, LPP) 一致流形近似与投影(Uniform Manifold Approximation and Projection, UMAP) t分布随机近邻嵌入(t-distributed Stochastic Neighbor Embedding, t-SNE) 局部线性嵌入(Locally Linear Embedding, LLE) 等度量映射(Isometric Mapping, ISOMAP) 多类别(Multiclass)与多标签(Multilabel)分类多维缩放(Multiple Dimensional Scaling, MDS) 核主成分分析(Kernelized Principal Component Analysis, KPCA) 拉普拉斯特征映射(Laplacian Eigenmaps, LE) 核方法(Kernel Method) 集成学习中的误差-分歧分解(Error-Ambiguity Decomposition) 集成学习中的多样性度量(Diversity Measure) 最大熵模型(Maximum Entropy) Multi-Armed Bandit(MAB)：多臂老虎机 Gaussian Mixture Model(GMM)：高斯混合模型异常检测(Anomaly Detection) 基于密度的聚类(Density-Based Clustering) 谱聚类(Spectral Clustering) 层次聚类(Hierarchical Clustering) 基于划分的聚类(Partition-based Clustering) 径向基函数网络(Radial Basis Function, RBF) 前馈神经网络(Feedforward Neural Networks, FNN) 深度信念网络(Deep Belief Network, DBN) Restricted Boltzmann Machine：受限玻尔兹曼机 Boltzmann Machine：玻尔兹曼机 Hopfield Neural Network：Hopfield神经网络 Energy-based Model：能量模型主成分分析(Principal Component Analysis, PCA) 自编码器(Autoencoder) 稀疏编码(Sparse Coding) 偏最小二乘回归(Partial Least Squares, PLS) 前向逐步回归(Stagewise Regression) 局部加权线性回归(Local Weighted Linear Regression) 岭回归与LASSO回归(Ridge/LASSO Regression) Tube回归(Tube Regression) 朴素贝叶斯(Naive Bayes) Recommender System：推荐系统期望最大算法(Expectation Maximization, EM) 变分推断(Variational Inference) 线性判别分析(Linear Discriminant Analysis, LDA) k近邻算法(k-Nearest Neighbor, kNN) 提升树(Boosting Tree) 梯度提升决策树(Gradient Boosted Decision Tree, GBDT) 随机森林(Random Forest) 决策树(Decision Tree) 集成学习中的提升(Boosting)方法集成学习中的Bagging(Bootstrap Aggregation)方法集成学习中的组合(Blending)策略支持向量回归(Support Vector Regression, SVR) 支持向量机(Support Vector Machine, SVM) 逻辑回归(Logistic Regression) 线性回归(Linear Regression) 感知机(Perceptron) 分类任务的常用性能指标模型评估方法模型复杂度理论机器学习的一些定理机器学习(Machine Learning)概述
「深度学习」大型语言模型(Large Language Model) 表型图像分析(Phenotypic Image Analysis) 全色锐化(Panchromatic Sharpening) 状态空间模型(State Space Model) 射频人体感知(RF-based Human Perception) 布局引导图像生成(Layout-to-Image Generation) 视觉-语言预训练(Vision-Language Pretraining) 开放集合目标检测(Open-Set Object Detection) 目标计数(Object Counting) 点云分类(Point Cloud Classification) 大模型的参数高效微调(Parameter-Efficient Fine-Tuning) 视觉Transformer(Vision Transformer) 度量学习(Metric Learning) 自监督学习(Self-Supervised Learning) 半监督学习(Semi-Supervised Learning) 主动学习(Active Learning) Transformer中的位置编码(Position Encoding) 扩散模型(Diffusion Model) 流模型(Flow-based Model) 变分自编码器(Variational Autoencoder) 生成对抗网络(Generative Adversarial Network) 轻量级(LightWeight)卷积神经网络多任务学习(Multi-Task Learning) 时空动作检测(Spatio-Temporal Action Detection) 降低Transformer的计算复杂度卷积神经网络中的池化(Pooling)层图像长尾分布(Long-Tail Distribution)问题卷积神经网络的可视化卷积神经网络中的自注意力机制(Self-Attention Mechanism) 卷积神经网络中的注意力机制(Attention Mechanism) 音乐生成图像超分辨率(Super Resolution) 对抗训练(Adversarial Training)：攻击和防御连接时序分类人体姿态估计(Human Pose Estimation) 图像到图像翻译(Image-to-Image Translation) 迁移学习(Transfer Learning) 终身学习(Lifelong Learning) 元学习(Meta Learning) 阅读理解文本检测与识别(Text Detection and Recognition) 图像描述文本摘要行人检测与属性识别(Pedestrian Detection and Attribute Recognition) 人脸检测, 识别与验证(Face Detection, Recognition, and Verification) 目标检测(Object Detection) 图像分割(Image Segmentation) 图像识别(Image Recognition) 网络压缩混合精度训练(Mixed Precision Training) 词嵌入深度学习的可解释性预训练语言模型(Pretrained Language Model) Transformer 自注意力机制(Self-Attention Mechanism) 记忆增强神经网络(Memory Augmented Neural Network) 序列到序列模型中的注意力机制(Attention Mechanism) 序列到序列模型(Sequence to sequence) 胶囊网络图神经网络(Graph Neural Network) 递归神经网络(Recursive Neural Network) 循环神经网络(Recurrent Neural Network) 卷积神经网络(Convolutional Neural Network) 深度学习中的初始化方法(Initialization) 深度学习中的归一化方法(Normalization) 深度学习中的正则化方法(Regularization) 深度学习中的优化算法(Optimization) 深度学习中的激活函数(Activation Function) 深度学习(Deep Learning)概述
「数学」小波变换(Wavelet Transform) 使用快速傅里叶变换(FFT)加速卷积运算欧拉路径(Euler Path)与de Bruijn图样条曲线(Spline Curve) 局部敏感哈希(Locality Sensitive Hashing) 集函数的子模性(Submodularity)与Lovász延拓(Lovász Extension) 二值图像的距离变换(Distance Transform) 积分概率度量(Integral Probability Metric) 利普希茨连续条件(Lipschitz Continuity Condition) 约束优化问题与对偶问题(Dual Problem) 线性规划的对偶理论(Duality Theory) 行列式点过程(Determinantal Point Process) 深度学习中的不确定性(Uncertainty) 琴生不等式(Jenson’s Inequality) 二分图与二分匹配(Bipartite Matching) 最优传输(Optimal Transport)问题与Wasserstein距离随机变量的变量替换定理(Change of Variable Theorem) 概率分布的重参数化(Reparameterization)技巧超球面上的von Mises-Fisher(vMF)分布圆周率(Ratio of Circumference to Diameter)的计算函数的光滑化(Smoothing) 多目标优化的帕累托最优(Pareto Optimality) 泰勒公式(Taylor Formula) 二进制乘法的Mitchell近似机器学习中的假设检验(Hypothesis Test) 瑞利商(Rayleigh Quotient)与广义(Generalized)瑞利商距离度量(Distance Metric)方法适定(well-posed)问题与不适定(ill-posed)问题张量分解(Tensor Decomposition) 概率分布之间的散度(Divergence) 参数估计(Parameter Estimation) 抽样分布(Sampling Distribution) 数理统计学(Mathematic Statistics)的基本概念
「英语」 40篇短文搞定3500个单词英语构词法英语国际音标英语简史
「Python」微调 Grounding DINO 和 Label Studio 进行半自动化目标检测标注目标检测数据集的分析使用einops实现张量操作使用torch.autograd.grad实现对输入求导使用pydensecrf构造条件随机场曲线的平滑处理方法使用opencv-python(cv2)库进行相机标定使用torchvision.transforms进行图像增强使用sympy.solve求解方程使用scipy.optimize.minimize求解非线性规划绘制混淆矩阵 MMDetection 用户笔记 Albumentations: 图像的数据增强库计算模型的参数量(Params)和运算量(FLOPs) Argmax与SoftArgmax 使用json库进行OpenPose输出关节点转换(25→18) Pytorch中的Hook机制使用Mayavi库进行3D绘图为RTX3090配置深度学习环境 Pytorch构建自己的数据集使用tifffile库处理tiff格式图像 (目标检测适用)批量修改xml文件中的name字段使用Matplotlib绘制训练曲线 ECCV 2020 Tutorial：PyTorch性能调优指南使用tqdm库绘制进度条使用numpy.bincount计算混淆矩阵批量处理文件夹内的图片处理Matlab中的mat格式文件 LeetCode刷题指南(Python) 数据结构与算法(Python) Python用户笔记
「随笔」国产大型语言模型的2025高考测评制蝶记博物记：忍冬与金银忍冬博物记：梅杏桃樱李梨海棠科普记：释放想象力吧！绕地球一圈的光线北京地铁三号线路考首乘国产大飞机C919 宝可梦北京大师赛将出现最强的口袋迷ag！二十四年磨一剑：毕业小记博物记：银杏菏泽一中百廿华诞赋会展记：香港M+特别展览“草间弥生：一九四五年至今” 漫威电影宇宙（MCU）列传：卷五漫威电影宇宙（MCU）列传：卷四科普记：讨论带传动的拉力关系科普记：立方体电阻的等效问题科普记：存储文件会增加手机的质量吗？科普记：彩虹的尽头是什么？科普记：“飞蛾扑火”问题研究
「论文阅读」 GenomeOcean: An Efficient Genome Foundation Model Trained on Large-Scale Metagenomic Assemblies GENERator: A Long-Context Generative Genomic Foundation Model HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling Genome modeling and design across all domains of life with Evo 2 Sequence modeling and design from molecular to genome scale with Evo HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution DNA language model GROVER learns sequence context in the human genome Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks A long-context language model for deciphering and generating bacteriophage genomes GENA-LM: A Family of Open-Source Foundational Models for Long DNA Sequences The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling DNA language models are powerful zero-shot predictors of non-coding variant effects DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome Revisiting Convolution Architecture in the Realm of DNA Foundation Models The Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions Transformers without Normalization Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features The Curse of Depth in Large Language Models The GAN is dead; long live the GAN! A Modern GAN Baseline Simple Hardware-Efficient Long Convolutions for Sequence Modeling Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions Hyena Hierarchy: Towards Larger Convolutional Language Models Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding DODA: Diffusion for Object-detection Domain Adaptation in Agriculture CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale Adapting the Segment Anything Model for Plant Recognition and Automated Phenotypic Parameter Measurement Adapting Vision Foundation Models for Plant Phenotyping Pan-Mamba: Effective pan-sharpening with State Space Model PanFlowNet: A Flow-Based Deep Network for Pan-sharpening Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion Deep Gradient Projection Networks for Pan-sharpening Mutual Information-driven Pan-sharpening Spatial-Frequency Domain Information Integration for Pan-Sharpening PanFormer: a Transformer Based Model for Pan-sharpening Pan-Sharpening with Customized Transformer and Invertible Neural Network Super-Resolution-Guided Progressive Pansharpening based on a Deep Convolutional Neural Network A Multi-Scale and Multi-Depth Convolutional Neural Network for Remote Sensing Imagery Pan-Sharpening PanNet: A Deep Network Architecture for Pan-Sharpening Pansharpening by Convolutional Neural Networks A universal SNP and small-indel variant caller using deep neural networks Wavelet Convolutions for Large Receptive Fields State-Free Inference of State-Space Models: The Transfer Function Approach Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Mamba-R: Vision Mamba ALSO Needs Registers MambaOut: Do We Really Need Mamba for Vision? Jamba: A Hybrid Transformer-Mamba Language Model LoRA-GA: Low-Rank Adaptation with Gradient Approximation VMamba: Visual State Space Model Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Hungry Hungry Hippos: Towards Language Modeling with State Space Models Mamba: Linear-Time Sequence Modeling with Selective State Spaces On the Parameterization and Initialization of Diagonal State Space Models Simplified State Space Layers for Sequence Modeling Diagonal State Spaces are as Effective as Structured State Spaces Resurrecting Recurrent Neural Networks for Long Sequences Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks Efficiently Modeling Long Sequences with Structured State Spaces HiPPO: Recurrent Memory with Optimal Polynomial Projections Human Posture Reconstruction for Through-the-Wall Radar Imaging Using Convolutional Neural Networks Recovering Human Pose and Shape From Through-the-Wall Radar Images Unsupervised Human Contour Extraction From Through-Wall Radar Images Using Dual UNet Through-Wall Human Pose Reconstruction Based on Cross-Modal Learning and Self-Supervised Learning Through-Wall Human Pose Estimation by Mutual Information Maximizing Deeply Supervised Nets RadarFormer: End-to-End Human Perception With Through-Wall Radar and Transformers YOLOv10: Real-Time End-to-End Object Detection Learning Spatial Similarity Distribution for Few-shot Object Counting KAN: Kolmogorov-Arnold Networks Condition-Aware Neural Network for Controlled Image Generation DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation OmniCount: Multi-label Object Counting with Semantic-Geometric Priors InstanceDiffusion: Instance-level Control for Image Generation LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation Adding Conditional Control to Text-to-Image Diffusion Models ReCo: Region-Controlled Text-to-Image Generation LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation GLIGEN: Open-Set Grounded Text-to-Image Generation Sigmoid Loss for Language Image Pre-Training VL-BEiT: Generative Vision-Language Pretraining Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information LoRA+: Efficient Low Rank Adaptation of Large Models Video generation models as world simulators Enhancing Zero-shot Counting via Language-guided Exemplar Learning Critical Data Size of Language Models from a Grokking Perspective Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Improving CLIP Training with Language Rewrites Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Attentive Mask CLIP MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining Scaling Language-Image Pre-training via Masking SLIP: Self-supervision meets Language-Image Pre-training Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation GLIPv2: Unifying Localization and Vision-Language Understanding CoCa: Contrastive Captioners are Image-Text Foundation Models VinVL: Revisiting Visual Representations in Vision-Language Models SimVLM: Simple Visual Language Model Pretraining with Weak Supervision GIT: A Generative Image-to-text Transformer for Vision and Language VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts Align before Fuse: Vision and Language Representation Learning with Momentum Distillation ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision Multimodal Few-Shot Learning with Frozen Language Models Unifying Vision-and-Language Tasks via Text Generation ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers UNITER: UNiversal Image-TExt Representation Learning Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks VL-BERT: Pre-training of Generic Visual-Linguistic Representations ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks VisualBERT: A Simple and Performant Baseline for Vision and Language LXMERT: Learning Cross-Modality Encoder Representations from Transformers Analyzing and Improving the Training Dynamics of Diffusion Models VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting Semantic Generative Augmentations for Few-Shot Counting FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection Learning Object-Language Alignments for Open-Vocabulary Object Detection RegionCLIP: Region-based Language-Image Pretraining Exploiting Unlabeled Data with Vision and Language Models for Object Detection Detecting Twenty-thousand Classes using Image-level Supervision Open Vocabulary Object Detection with Pseudo Bounding-Box Labels Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Open-Vocabulary Object Detection Using Captions MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Towards Open-Set Object Detection and Discovery Grounded Language-Image Pre-training Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Finite Scalar Quantization: VQ-VAE Made Simple ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting Effective Whole-body Pose Estimation with Two-stages Distillation MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Open-world Text-specified Object Counting One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning Faster sorting algorithms discovered using deep reinforcement learning Towards Partial Supervision for Generic Object Counting in Natural Scenes Object Counting and Instance Segmentation with Image-level Supervision Class-aware Object Counting MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond Dilated-Scale-Aware Attention ConvNet For Multi-Class Object Counting QLoRA: Efficient Finetuning of Quantized LLMs Zero-shot Improvement of Object Counting with CLIP Teaching CLIP to Count to Ten Can SAM Count Anything? An Empirical Study on SAM Counting Vicinal Counting Networks Class-agnostic Few-shot Object Counting GCNet: Probing Self-Similarity Learning for Generalized Counting Network Mimetic Initialization of Self-Attention Layers Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting Zero-shot Object Counting Scale-Prior Deformable Convolution for Exemplar-Guided Class-Agnostic Counting CLIP-Count: Towards Text-Guided Zero-Shot Object Counting A Low-Shot Object Counting Network With Iterative Prototype Adaptation CounTR: Transformer-based Generalised Visual Counting Exemplar Free Class Agnostic Counting Few-shot Object Counting and Detection Learning to Count Anything: Reference-less Class-agnostic Counting with Weak Supervision Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting Few-shot Object Counting with Similarity-Aware Feature Enhancement Object Counting: You Only Need to Look at One Learning To Count Everything Class-Agnostic Counting Are Emergent Abilities of Large Language Models a Mirage? PCT: Point cloud transformer PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Dynamic Graph CNN for Learning on Point Clouds PointCNN: Convolution On X-Transformed Points Segment Anything PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation OctNet: Learning Deep 3D Representations at High Resolutions VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition Multi-view Convolutional Neural Networks for 3D Shape Recognition RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose Human Pose as Compositional Tokens Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning DeepMIM: Deep Supervision for Masked Image Modeling Masked Image Modeling with Local Multi-Scale Reconstruction Symbolic Discovery of Optimization Algorithms Visual Prompt Tuning AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning Towards a Unified View of Parameter-Efficient Transfer Learning Parameter-Efficient Transfer Learning with Diff Pruning DensePose From WiFi LoRA: Low-Rank Adaptation of Large Language Models AdapterDrop: On the Efficiency of Adapters in Transformers AdapterFusion: Non-Destructive Task Composition for Transfer Learning P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks GPT Understands, Too The Power of Scale for Parameter-Efficient Prompt Tuning Prefix-Tuning: Optimizing Continuous Prompts for Generation BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models Parameter-Efficient Transfer Learning for NLP Ultralytics YOLOv8 Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers Refiner: Refining Self-attention for Vision Transformers Improve Vision Transformers Training by Suppressing Over-smoothing Twins: Revisiting the Design of Spatial Attention in Vision Transformers All Tokens Matter: Token Labeling for Training Better Vision Transformers Incorporating Convolution Designs into Visual Transformers CvT: Introducing Convolutions to Vision Transformers Per-Pixel Classification is Not All You Need for Semantic Segmentation Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions Segmenter: Transformer for Semantic Segmentation Rethinking Spatial Dimensions of Vision Transformers SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification Do We Really Need Explicit Position Encodings for Vision Transformers? Visual Transformers: Token-based Image Representation and Processing for Computer Vision LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference Escaping the Big Data Paradigm with Compact Transformers Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Going deeper with Image Transformers ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders DeepViT: Towards Deeper Vision Transformer Training data-efficient image transformers & distillation through attention Better plain ViT baselines for ImageNet-1k Scalable Diffusion Models with Transformers Position Prediction as an Effective Pretraining Strategy Pooling Revisited: Your Receptive Field is Suboptimal A ConvNet for the 2020s Deformable Convolutional Networks Deformable ConvNets v2: More Deformable, Better Results CondConv: Conditionally Parameterized Convolutions for Efficient Inference Dynamic Convolution: Attention over Convolution Kernels DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks Omni-Dimensional Dynamic Convolution Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution Dynamic Region-Aware Convolution An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution Inception Convolution with Efficient Dilation Search Large Language Models Are Human-Level Prompt Engineers Scaling Instruction-Finetuned Language Models What learning algorithm is in-context learning? Investigations with linear models Inverse scaling can become U-shaped Heatmap Distribution Matching for Human Pose Estimation AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time RTMDet: An Empirical Study of Designing Real-Time Object Detectors Mass-Editing Memory in a Transformer Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale Fast Fourier Convolution Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation Parametric Instance Classification for Unsupervised Visual Feature Learning Self-Supervised Learning based on Heat Equation Revealing the Dark Secrets of Masked Image Modeling On Data Scaling in Masked Image Modeling ConvMAE: Masked Convolution Meets Masked Autoencoders SimMIM: A Simple Framework for Masked Image Modeling iBOT: Image BERT Pre-Training with Online Tokenizer BEiT: BERT Pre-Training of Image Transformers Discovering faster matrix multiplication algorithms with reinforcement learning Analyzing and Improving Representations with the Soft Nearest Neighbor Loss Circle Loss: A Unified Perspective of Pair Similarity Optimization Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning Person re-identification by multi-channel parts-based CNN with improved triplet loss function In Defense of the Triplet Loss for Person Re-Identification Ranked List Loss for Deep Metric Learning ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis Proxy Anchor Loss for Deep Metric Learning Deep Metric Learning for Practical Person Re-Identification No Fuss Distance Metric Learning using Proxies Deep Metric Learning with Hierarchical Triplet Loss Deep Metric Learning via Facility Location Beyond triplet loss: a deep quadruplet network for person re-identification Deep Metric Learning with Angular Loss Metric Learning with Adaptive Density Discrimination Improved Deep Metric Learning with Multi-class N-pair Loss Objective Learning Deep Embeddings with Histogram Loss Deep Metric Learning via Lifted Structured Feature Embedding FaceNet: A Unified Embedding for Face Recognition and Clustering Jigsaw Clustering for Unsupervised Visual Representation Learning Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning Emerging Properties in Self-Supervised Vision Transformers Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations Evolving Losses for Unsupervised Video Representation Learning Exploring Simple Siamese Representation Learning CURL: Contrastive Unsupervised Representations for Reinforcement Learning Unsupervised Learning of Visual Features by Contrasting Cluster Assignments An Empirical Study of Training Self-Supervised Vision Transformers Improved Baselines with Momentum Contrastive Learning Momentum Contrast for Unsupervised Visual Representation Learning Contrastive Multiview Coding Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination Unsupervised Embedding Learning via Invariant and Spreading Instance Feature Bootstrap your own latent: A new approach to self-supervised Learning Barlow Twins: Self-Supervised Learning via Redundancy Reduction A Simple Framework for Contrastive Learning of Visual Representations Contrastive Learning with Hard Negative Samples Debiased Contrastive Learning Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere Data-Efficient Image Recognition with Contrastive Predictive Coding Representation Learning with Contrastive Predictive Coding Colorful Image Colorization Representation Learning by Learning to Count Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles Unsupervised Visual Representation Learning by Context Prediction Unsupervised Representation Learning by Predicting Image Rotations Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow Big Self-Supervised Models are Strong Semi-Supervised Learners FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence DivideMix: Learning with Noisy Labels as Semi-supervised Learning ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring MixMatch: A Holistic Approach to Semi-Supervised Learning Meta Pseudo Labels Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning Label Propagation for Deep Semi-supervised Learning Unsupervised Data Augmentation for Consistency Training Interpolation Consistency Training for Semi-Supervised Learning Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results Temporal Ensembling for Semi-Supervised Learning Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning Deep Active Learning: Unified and Principled Method for Query and Training Discriminative Active Learning BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning Batch Active Learning Using Determinantal Point Processes Batch Active Learning at Scale Deep Similarity-Based Batch Mode Active Learning with Exploration-Exploitation Cost-Effective Active Learning for Deep Image Classification Diverse mini-batch Active Learning Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation Bayesian Generative Active Deep Learning Generative Adversarial Active Learning Adversarial Active Learning for Deep Networks: a Margin Based Approach When Deep Learners Change Their Mind: Learning Dynamics for Active Learning Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds Active Learning for Convolutional Neural Networks: A Core-Set Approach Active Learning by Acquiring Contrastive Examples Minimax Active Learning Weight Uncertainty in Neural Networks Learning Loss for Active Learning Deep Bayesian Active Learning with Image Data SIoU Loss: More Powerful Learning for Bounding Box Regression Training language models to follow instructions with human feedback Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? Emergent Abilities of Large Language Models Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models Training Compute-Optimal Large Language Models Locating and Editing Factual Associations in GPT Modifying Memories in Transformer Models Towards TracIng Factual Knowledge in Language Models Back to the Training Data On the Role of Bidirectionality in Language Model Pre-Training YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Comprehensive Guide to Ultralytics YOLOv5 ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions RoFormer: Enhanced Transformer with Rotary Position Embedding Your Transformer May Not be as Powerful as You Expect Encoding word order in complex embeddings Self-Attention with Relative Position Representations Learning to Encode Position for Transformer with Continuous Dynamical Model Dual Contrastive Learning for Unsupervised Image-to-Image Translation Making the Invisible Visible: Action Recognition Through Walls and Occlusions Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Hierarchical Text-Conditional Image Generation with CLIP Latents Learning Longterm Representations for Person Re-Identification Using Radio Signals Variational Diffusion Models Poisson Flow Generative Models Noether Networks: Meta-Learning Useful Conserved Quantities High-Resolution Image Synthesis with Latent Diffusion Models Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks Alias-Free Generative Adversarial Networks Semi-Supervised Learning with Generative Adversarial Networks ClusterGAN : Latent Space Clustering in Generative Adversarial Networks Deep Symbolic Regression for Recurrent Sequences Training Generative Adversarial Networks with Limited Data Classifier-Free Diffusion Guidance More Control for Free! Image Synthesis with Semantic Diffusion Guidance Diffusion Models Beat GANs on Image Synthesis Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models Score-Based Generative Modeling through Stochastic Differential Equations Denoising Diffusion Implicit Models Improved Denoising Diffusion Probabilistic Models Denoising Diffusion Probabilistic Models Analyzing and Improving the Image Quality of StyleGAN A Style-Based Generator Architecture for Generative Adversarial Networks Do As I Can, Not As I Say: Grounding Language in Robotic Affordances On Self Modulation for Generative Adversarial Networks Large Scale GAN Training for High Fidelity Natural Image Synthesis Taming Transformers for High-Resolution Image Synthesis Self-Attention Generative Adversarial Networks cGANs with Projection Discriminator StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks SinGAN: Learning a Generative Model from a Single Natural Image Progressive Growing of GANs for Improved Quality, Stability, and Variation Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks Context Encoders: Feature Learning by Inpainting Semantic Image Synthesis with Spatially-Adaptive Normalization Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation Softmax GAN On Convergence and Stability of GANs Invertible Residual Networks Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design Improving Variational Inference with Inverse Autoregressive Flow Contrastive Learning for Unpaired Image-to-Image Translation TFPose: Direct Human Pose Estimation with Transformers Masked Autoregressive Flow for Density Estimation Variational Inference with Normalizing Flows Unsupervised Learning for Human Sensing Using Radio Signals Glow: Generative Flow with Invertible 1x1 Convolutions Density estimation using Real NVP NICE: Non-linear Independent Components Estimation GANILLA: Generative Adversarial Networks for Image to Illustration Translation Rethinking the Truly Unsupervised Image-to-Image Translation High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network Multimodal Unsupervised Image-to-Image Translation Simple yet Effective Way for Improving the Performance of GAN f-VAEs: Improve VAEs with Conditional Flows Temporal Difference Variational Auto-Encoder NVAE: A Deep Hierarchical Variational Autoencoder Hyperspherical Variational Auto-Encoders A Batch Normalized Inference Network Keeps the KL Vanishing Away Variational Inference of Disentangled Latent Concepts from Unlabeled Observations Structured Disentangled Representations Disentangling by Factorising Sliced-Wasserstein Autoencoder: An Embarrassingly Simple Generative Model Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder Learning to Generate Images with Perceptual Similarity Metrics Learning Disentangled Joint Continuous and Discrete Representations Categorical Reparameterization with Gumbel-Softmax Deep Feature Consistent Variational Autoencoder Tighter Variational Bounds are Not Necessarily Better Importance Weighted Autoencoders Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB Isolating Sources of Disentanglement in Variational Autoencoders Wasserstein Auto-Encoders Variational methods for Conditional Multimodal Learning: Generating Human Faces from Attributes Learning Structured Output Representation using Deep Conditional Generative Models Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling Demystifying MMD GANs Fisher GAN MMD GAN: Towards Deeper Understanding of Moment Matching Network Generative Moment Matching Networks McGan: Mean and Covariance Feature Matching GAN A Note on the Inception Score GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium Which Training Methods for GANs do actually Converge? Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Unsupervised Image-to-Image Translation Networks StarGAN v2: Diverse Image Synthesis for Multiple Domains StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation Toward Multimodal Image-to-Image Translation DualGAN: Unsupervised Dual Learning for Image-to-Image Translation Learning to Discover Cross-Domain Relations with Generative Adversarial Networks Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks Competition-Level Code Generation with AlphaCode Evaluating Large Language Models Trained on Code Self-Correction for Human Parsing Image-to-Image Translation with Conditional Adversarial Networks Gradientless Descent: High-Dimensional Zeroth-Order Optimization Coupled Generative Adversarial Networks Designing GANs: A Likelihood Ratio Approach InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Transformer Quality in Linear Time Step-size Adaptation Using Exponentiated Gradient Updates Conditional Image Synthesis With Auxiliary Classifier GANs Conditional Generative Adversarial Nets Boundary-Seeking Generative Adversarial Networks BEGAN: Boundary Equilibrium Generative Adversarial Networks Efficient Through-wall Human Pose Reconstruction Using UWB MIMO Radar Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities MAGAN: Margin Adaptation for Generative Adversarial Networks Maximum Entropy Generators for Energy-Based Models GAN-QP: A Novel GAN Framework without Gradient Vanishing and Lipschitz Constraint The relativistic discriminator: a key element missing from standard GAN Adversarial Autoencoders Gradients without Backpropagation Adversarial Feature Learning Autoencoding beyond pixels using a learned similarity metric Energy-based Generative Adversarial Network Least Squares Generative Adversarial Networks Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks Wasserstein GANs Work Because They Fail (to Approximate the Wasserstein Distance) How Well Do WGANs Estimate the Wasserstein Metric? GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial Networks Gradient Normalization for Generative Adversarial Networks Wasserstein Divergence for GANs Spectral Normalization for Generative Adversarial Networks f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization Improved Training of Wasserstein GANs Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Wasserstein GAN Towards Principled Methods for Training Generative Adversarial Networks Improved Techniques for Training GANs Advancing mathematics by guiding human intuition with AI Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets Jurassic-1: Technical details and evaluation Scaling Language Models: Methods, Analysis & Insights from Training Gopher Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Masked Autoencoders Are Scalable Vision Learners Variational Adversarial Active Learning Robust and Generalizable Visual Representation Learning via Random Convolutions TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation RandAugment: Practical automated data augmentation with a reduced search space AutoAugment: Learning Augmentation Policies from Data Random Erasing Data Augmentation Squareplus: A Softplus-Like Algebraic Rectifier Activate or Not: Learning Customized Activation SMU: smooth activation function for deep networks using smoothing maximum technique MicroNet: Towards Image Recognition with Extremely Low FLOPs GhostNet: More Features from Cheap Operations SAU: Smooth activation function using convolution with approximate identities Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs) ZerO Initialization: Initializing Neural Networks with only Zeros and Ones Dynamic ReLU Learning Activation Functions to Improve Deep Neural Networks Orthogonal-Padé Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks Maxout Networks Learning specialized activation functions with the Piecewise Linear Unit Finetuned Language Models Are Zero-Shot Learners SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos Pose for Everything: Towards Category-Agnostic Pose Estimation Localization with Sampling-Argmax Poseur: Direct Human Pose Regression with Transformers Next-Generation Pose Detection with MoveNet and TensorFlow Low-resolution Human Pose Estimation TOOD: Task-aligned One-stage Object Detection Unifying Nonlocal Blocks for Neural Networks Attention Augmented Convolutional Networks Dynamic Task Prioritization for Multitask Learning Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Region-based Non-local Operation for Video Classification Exploring Self-attention for Image Recognition Image Super-Resolution with Non-Local Sparse Attention Polarized Self-Attention: Towards High-quality Pixel-wise Regression DMSANet: Dual Multi Scale Attention Network SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks Residual Attention: A Simple but Effective Method for Multi-Label Recognition Sluice networks: Learning what to share between loosely related tasks Cross-stitch Networks for Multi-task Learning Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification Learning Multiple Tasks with Multilinear Relationship Networks IGCV2: Interleaved Structured Sparse Convolutional Neural Networks Interleaved Group Convolutions for Deep Neural Networks ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices SqueezeNext: Hardware-Aware Neural Network Design SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size Searching for MobileNetV3 MobileNetV2: Inverted Residuals and Linear Bottlenecks MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications EfficientNetV2: Smaller Models and Faster Training EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks Loss-Balanced Task Weighting to Reduce Negative Transfer in Multi-Task Learning End-to-End Multi-Task Learning with Attention Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics Searching for Activation Functions The Quest for the Golden Activation Function Self-Normalizing Neural Networks Empirical Evaluation of Rectified Activations in Convolutional Network Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Rectifier Nonlinearities Improve Neural Network Acoustic Models Training Deeper Convolutional Networks with Deep Supervision Deeply-Supervised Nets Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units Continuously Differentiable Exponential Linear Units Mish: A Self Regularized Non-Monotonic Activation Function XLNet: Generalized Autoregressive Pretraining for Language Understanding MASS: Masked Sequence to Sequence Pre-training for Language Generation Unified Language Model Pre-training for Natural Language Understanding and Generation RoBERTa: A Robustly Optimized BERT Pretraining Approach Efficient Attention: Attention with Linear Complexities Longformer: The Long-Document Transformer Linformer: Self-Attention with Linear Complexity Rethinking Attention with Performers Reformer: The Efficient Transformer Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks ResMLP: Feedforward networks for image classification with data-efficient training Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet MLP-Mixer: An all-MLP Architecture for Vision Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE CompConv: A Compact Convolution Module for Efficient Feature Learning Integrating Circle Kernels into Convolutional Neural Networks YOLOX: Exceeding YOLO Series in 2021 Human Pose Regression with Residual Log-likelihood Estimation BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition Actions as Moving Points MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions Equalization Loss v2: A New Gradient Balance Approach for Long-tailed Object Detection Generating Long Sequences with Sparse Transformers R-Drop: Regularized Dropout for Neural Networks Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers Addressing Some Limitations of Transformers with Feedback Memory Language Models are Open Knowledge Graphs Supermasks in Superposition UNet++: A Nested U-Net Architecture for Medical Image Segmentation Fourier Neural Operator for Parametric Partial Differential Equations Gradient Centralization: A New Optimization Technique for Deep Neural Networks BERTnesia: Investigating the capture and forgetting of knowledge in BERT PeCLR: Self-Supervised 3D Hand Pose Estimation from monocular RGB via Equivariant Contrastive Learning GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers Learning Spatial Fusion for Single-Shot Object Detection Region Proposal by Guided Anchoring Gradient Harmonized Single-stage Detector HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax Deformable DETR: Deformable Transformers for End-to-End Object Detection Segmented convolutional gated recurrent neural networks for human activity recognition in ultra-wideband radar Human Motion Recognition With Limited Radar Micro-Doppler Signatures Object Detection from Video Tubelets with Convolutional Neural Networks Action Tubelet Detector for Spatio-Temporal Action Localization Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images W-Net: A Deep Model for Fully Unsupervised Image Segmentation M-Net: A Convolutional Neural Network for Deep Brain Structure Segmentation V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation Human motion recognition exploiting radar with stacked recurrent neural network Extracting Training Data from Large Language Models EfficientDet: Scalable and Efficient Object Detection Learning from Noisy Anchors for One-stage Object Detection Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3 Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training Sparse R-CNN: End-to-End Object Detection with Learnable Proposals RepPoints: Point Set Representation for Object Detection AutoAssign: Differentiable Label Assignment for Dense Object Detection Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection VarifocalNet: An IoU-aware Dense Object Detector Soft Anchor-Point Object Detection Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection Libra R-CNN: Towards Balanced Learning for Object Detection Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Unsupervised Adversarial Domain Adaptation for Micro-Doppler Based Human Activity Classification Unsupervised Domain Adaptation for Micro-Doppler Human Motion Classification via Feature Fusion Through-Wall Human Motion Recognition Based on Transfer Learning and Ensemble Learning Cross-regional oil palm tree counting and detection via a multi-level attention domain adaptation network Cross-Regional Oil Palm Tree Detection Radar-Based Human Activity Recognition With 1-D Dense Attention Network Lite-HRNet: A Lightweight High-Resolution Network Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Knowledge Neurons in Pretrained Transformers Transformer Feed-Forward Layers Are Key-Value Memories Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild Numerical Coordinate Regression with Convolutional Neural Networks Removing the Bias of Integral Pose Regression Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation TokenPose: Learning Keypoint Tokens for Human Pose Estimation Online Knowledge Distillation for Efficient Pose Estimation Integral Human Pose Regression Distribution-Aware Coordinate Representation for Human Pose Estimation The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation AID: Pushing the Performance Boundary of Human Pose Estimation with Information Dropping Augmentation High-Performance Large-Scale Image Recognition Without Normalization PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation 3D Human Pose Estimation = 2D Pose Estimation + Matching Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network Deep High-Resolution Representation Learning for Human Pose Estimation Rethinking on Multi-Stage Networks for Human Pose Estimation Cascaded Pyramid Network for Multi-Person Pose Estimation DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation Associative Embedding: End-to-End Learning for Joint Detection and Grouping RMPE: Regional Multi-person Pose Estimation Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields Learning Feature Pyramids for Human Pose Estimation Multi-Context Attention for Human Pose Estimation Chained Predictions Using Convolutional Neural Networks Convolutional Pose Machines Stacked Hourglass Networks for Human Pose Estimation DeBERTa: Decoding-enhanced BERT with Disentangled Attention DeepPose: Human Pose Estimation via Deep Neural Networks When Do You Need Billions of Words of Pretraining Data? FCOS: A Simple and Strong Anchor-free Object Detector AMASS: Archive of Motion Capture as Surface Shapes Is Attention Better Than Matrix Decomposition? SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation Object-Contextual Representations for Semantic Segmentation Boundary loss for highly unbalanced segmentation Objects as Points Focal Loss for Dense Object Detection SSD: Single Shot MultiBox Detector YOLOv3: An Incremental Improvement Revisiting ResNets: Improved Training and Scaling Strategies YOLO9000: Better, Faster, Stronger You Only Look Once: Unified, Real-Time Object Detection KeepAugment: A Simple Information-Preserving Data Augmentation Approach BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension ReZero is All You Need: Fast Convergence at Large Depth Involution: Inverting the Inherence of Convolution for Visual Recognition Rethinking the Inception Architecture for Computer Vision Cascade R-CNN: Delving into High Quality Object Detection Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Transformer in Transformer Fast R-CNN Coordinate Attention for Efficient Mobile Network Design Towards 3D Human Pose Construction Using WiFi Person-in-WiFi: Fine-grained Person Perception using WiFi TransGAN: Two Transformers Can Make One Strong GAN Rich feature hierarchies for accurate object detection and semantic segmentation Unified Perceptual Parsing for Scene Understanding mm-Pose: Real-Time Human Skeletal Posture Estimation using mmWave Radars and CNNs PSANet: Point-wise Spatial Attention Network for Scene Parsing Through-Wall Human Pose Reconstruction via UWB MIMO Radar and 3D CNN Adaptive Pyramid Context Network for Semantic Segmentation Dynamic Multi-Scale Filters for Semantic Segmentation DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation Context Encoding for Semantic Segmentation Attention U-Net: Learning Where to Look for the Pancreas RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation Pyramid Scene Parsing Network Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation Rethinking Atrous Convolution for Semantic Image Segmentation DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs U-Net: Convolutional Networks for Biomedical Image Segmentation RF-Based 3D Skeletons SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation A Survey on Visual Transformer Pre-Trained Image Processing Transformer Fully Convolutional Networks for Semantic Segmentation Augmentation for small object detection A Survey of Handy See-Through Wall Technology RepVGG: Making VGG-style ConvNets Great Again Image Transformer Through-Wall Pose Imaging in Real-Time with a Many-to-Many Encoder/Decoder Paradigm Expressive Body Capture: 3D Hands, Face, and Body from a Single Image Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Bottleneck Transformers for Visual Recognition SA-Net: Shuffle Attention for Deep Convolutional Neural Networks 3D Imaging of Moving Targets for Ultra-wideband MIMO Through-wall Radar System Panoptic Feature Pyramid Networks BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation GRUU-Net: Integrated convolutional and gated recurrent neural network for cell segmentation PointRend: Image Segmentation as Rendering K-Net: Towards Unified Image Segmentation Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks Balanced Meta-Softmax for Long-Tailed Visual Recognition Seesaw Loss for Long-Tailed Instance Segmentation Equalization Loss for Long-Tailed Object Recognition Class-Balanced Loss Based on Effective Number of Samples Decoupling Representation and Classifier for Long-Tailed Recognition ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image Language Models are Unsupervised Multitask Learners mT5: A massively multilingual pre-trained text-to-text transformer GLU Variants Improve Transformer Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer SMPL: A Skinned Multi-Person Linear Model Learning Transferable Visual Models From Natural Language Supervision Long-tail learning via logit adjustment On the Relationship between Self-Attention and Convolutional Layers Improving Language Understanding by Generative Pre-Training BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Deep contextualized word representations Deformable DETR: Deformable Transformers for End-to-End Object Detection An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Generative Pretraining from Pixels Do We Need Zero Training Loss After Achieving Zero Training Error? REALM: Retrieval-Augmented Language Model Pre-Training OneNet: Towards End-to-End One-Stage Object Detection Implicit Gradient Regularization Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Multimodal Machine Learning: A Survey and Taxonomy Learning Continuous Image Representation with Local Implicit Image Function AdaX: Adaptive Gradient Descent with Exponential Long Term Memory Adafactor: Adaptive Learning Rates with Sublinear Memory Cost Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Large Batch Training of Convolutional Networks Lookahead Optimizer: k steps forward, 1 step back On the Variance of the Adaptive Learning Rate and Beyond Incorporating Nesterov Momentum into Adam On the Convergence of Adam and Beyond Adam: A Method for Stochastic Optimization On the importance of initialization and momentum in deep learning A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm ADADELTA: An Adaptive Learning Rate Method Don’t Decay the Learning Rate, Increase the Batch Size InfoVAE: Balancing Learning and Inference in Variational Autoencoders Understanding disentangling in β-VAE β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework Attentional Feature Fusion Memory-Efficient Adaptive Optimization Averaging Weights Leads to Wider Optima and Better Generalization Decoupled Weight Decay Regularization ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks On Layer Normalization in the Transformer Architecture A^2-Nets: Double Attention Networks Neural Architecture Search for Lightweight Non-Local Networks Interlaced Sparse Self-Attention for Semantic Segmentation Through-Wall Human Mesh Recovery Using Radio Signals A simple yet effective baseline for 3d human pose estimation Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods Asymmetric Non-local Neural Networks for Semantic Segmentation Expectation-Maximization Attention Networks for Semantic Segmentation Dual Attention Network for Scene Segmentation Generating Diverse High-Fidelity Images with VQ-VAE-2 Neural Discrete Representation Learning Fixup Initialization: Residual Learning Without Normalization CCNet: Criss-Cross Attention for Semantic Segmentation GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond Non-Local Neural Networks Through-Wall Human Pose Estimation Using Radio Signals Competitive Inner-Imaging Squeeze and Excitation for Residual Network SRM: A Style-based Recalibration Module for Convolutional Neural Networks Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context An Attention Module for Convolutional Neural Networks NAM: Normalization-based Attention Module Residual Attention Network for Image Classification Attention as Activation Interflow: Aggregating Multi-layer Feature Mappings with Attention Mechanism Spanet: Spatial Pyramid Attention Network for Enhanced Image Recognition EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network End-to-End Adversarial Text-to-Speech Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery BA^2M: A Batch Aware Attention Module for Image Classification Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network DIANet: Dense-and-Implicit Attention Network On the Measure of Intelligence DCANet: Learning Connected Attentions for Convolutional Neural Networks Rotate to Attend: Convolutional Triplet Attention Module Improving Convolutional Networks with Self-calibrated Convolutions You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery The Hardware Lottery Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks FcaNet: Frequency Channel Attention Networks Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks CBAM: Convolutional Block Attention Module BAM: Bottleneck Attention Module Global Second-order Pooling Convolutional Networks Selective Kernel Networks Squeeze-and-Excitation Networks Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning Why gradient clipping accelerates training: A theoretical justification for adaptivity Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow AdderNet: Do We Really Need Multiplications in Deep Learning? Deep Variational Information Bottleneck Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations The Geometric Occam’s Razor Implicit in Deep Learning Implicit Gradient Regularization Spectral Norm Regularization for Improving the Generalizability of Deep Learning Understanding the Role of Individual Units in a Deep Neural Network High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks Rethinking Pre-training and Self-training Rethinking ImageNet Pre-training Unsupervised Translation of Programming Languages Neural Architecture Search without Training ResNeSt: Split-Attention Networks Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution Selective Kernel Networks DropBlock: A regularization method for convolutional networks Funnel Activation for Visual Recognition Learning in the Frequency Domain Simple Regret Minimization for Contextual Bandits Learning Sparse Neural Networks through L0 Regularization Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network Second-order Attention Network for Single Image Super-Resolution Big Bird: Transformers for Longer Sequences Self-training with Noisy Student improves ImageNet classification Enhanced Deep Residual Networks for Single Image Super-Resolution Accurate Image Super-Resolution Using Very Deep Convolutional Networks Accelerating the Super-Resolution Convolutional Neural Network Image Super-Resolution Using Deep Convolutional Networks Deep Back-Projection Networks For Super-Resolution Image Super-Resolution Using Very Deep Residual Channel Attention Networks CornerNet: Detecting Objects as Paired Keypoints Movement Pruning: Adaptive Sparsity by Fine-Tuning SCAN: Learning to Classify Images without Labels Synthesizer: Rethinking Self-Attention in Transformer Models Language Models are Few-Shot Learners Deep Ensembles: A Loss Landscape Perspective When BERT Plays the Lottery, All Tickets Are Winning Deep image reconstruction from human brain activity Investigating Human Priors for Playing Video Games Meta-Learning with Implicit Gradients A critical analysis of self-supervision, or what we can learn from a single image Faster Neural Network Training with Data Echoing Concept Learning with Energy-Based Models Big Transfer (BiT): General Visual Representation Learning Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning Reinforcement Learning with Augmented Data TAPAS: Weakly Supervised Table Parsing via Pre-training Jukebox: A Generative Model for Music Do ImageNet Classifiers Generalize to ImageNet? Group Normalization Weight Standardization mixup: Beyond Empirical Risk Minimization DETR：End-to-End Object Detection with Transformers YOLOv4: Optimal Speed and Accuracy of Object Detection Scaling Laws for Neural Language Models Recent Advances in Deep Learning for Object Detection DeepFace: Closing the Gap to Human-Level Performance in Face Verification Convolutional Neural Networks for Sentence Classification MMDetection: Open MMLab Detection Toolbox and Benchmark Bag of Tricks for Image Classification with Convolutional Neural Networks
「游记」（辽宁篇）丹东：抗美援朝安社稷，保家卫国东方红（内蒙古篇）赤峰：红山淬火玉龙铸，风沙磨刃文明兴（上海篇）上海：潮涌申江浸上海，海上浸江申涌潮（浙江篇）安吉：安且吉兮桃城镇，青山绿水展新颜（日本篇）富士：五湖八海绘浮世，富士山下舞和风（日本篇）镰仓：镰仓幕府风云散，湘南江岛白浪翻（日本篇）东京：江户风华承古韵，东京霓虹映今辉（河北篇）唐山：凤凰涅槃卌五载，钢城焕彩铸新篇（内蒙古篇）响沙湾：响沙唱晚驼铃脆，绿进沙退又一春（内蒙古篇）乌兰察布：辉腾锡勒风车转，乌兰哈达火山眠（内蒙古篇）呼和浩特：青城共谱团结曲，蒙汉同书盛世歌（辽宁篇）沈阳：沈水之阳辞旧岁，龙行龘龘迎新春（河北篇）保定：直隶故都文脉厚，驴肉火烧滋味长（黑龙江篇）哈尔滨：冰城雪砌琼楼景，松水波摇尔滨情（广东篇）珠海：渔女凭栏观珠海，长桥卧波共伶仃（澳门篇）澳门：妈港非我真名姓，赌城之外有洞天（广东篇）深圳：改革征途无穷尽，开放浪潮涌未息（香港篇）香港：港岛风华香江畔，维湾灯火映辉煌（广东篇）广州：五羊衔穗珠水畔，高塔早茶粤韵悠（河北篇）承德：昔日皇家避暑地，热河潺潺话古今（江西篇）九江：不识庐山真面目，九派浔阳似画图（湖南篇）长沙：湘江北去千帆过，橘子洲头万木春（陕西篇）西安：天长地久有时尽，一日看尽长安花（重庆篇）重庆：赛博织就山城梦，麻辣烹出江湖情（四川篇）成都：晓看天府红湿处，芙蓉花重锦官城（韩国篇）济州岛：汉拿山下橘香远，火山痕中凝文明（泰国篇）普吉岛：安达曼畔珍珠落，芭东海滩万舸忙（泰国篇）芭提雅：灯红酒绿霓虹舞，孤城幽幽醉晚风（泰国篇）华欣：往日皇室清幽地，今朝游人惬意乡（泰国篇）曼谷：相遇以萨瓦迪卡，徜徉于天使之城（英国篇）伯克郡：温莎古堡皇旗易，御苑芳园绿常荫（英国篇）格林尼治：子午为界定寰宇，帆钟初响始计时（英国篇）牛津郡：古垣巍巍藏经帙，黉宇悠悠贯岁华（英国篇）剑桥郡：不惹云裳康河畔，书香氤氲剑桥城（英国篇）伦敦：英伦浓雾终散去，泰晤士旁夕阳斜