- 机器学习 深度学习 数学 英语 Python 随笔 论文阅读 游记
- 「机器学习」 类别型特征提升(Categorical Boosting, CatBoost) 轻量级梯度提升机(Light Gradient Boosting Machine, LightGBM) 极限梯度提升(eXtreme Gradient Boosting, XGBoost) Self-Organizing Map(SOM):自组织映射神经网络 局部保留投影(Locality Preserving Projection, LPP) 一致流形近似与投影(Uniform Manifold Approximation and Projection, UMAP) t分布随机近邻嵌入(t-distributed Stochastic Neighbor Embedding, t-SNE) 局部线性嵌入(Locally Linear Embedding, LLE) 等度量映射(Isometric Mapping, ISOMAP) 多类别(Multiclass)与多标签(Multilabel)分类 多维缩放(Multiple Dimensional Scaling, MDS) 核主成分分析(Kernelized Principal Component Analysis, KPCA) Kernel Method:核方法 集成学习中的误差-分歧分解(Error-Ambiguity Decomposition) 集成学习中的多样性度量(Diversity Measure) Maximum Entropy:最大熵模型 Multi-Armed Bandit(MAB):多臂老虎机 Gaussian Mixture Model(GMM):高斯混合模型 Transfer Learning:迁移学习 Lifelong Learning:终身学习 Meta Learning:元学习 Anomaly Detection:异常检测 Mean-Shift 谱聚类(Spectral Clustering) 层次聚类 K-Means聚类 Radial Basis Function(RBF):径向基函数网络 前馈神经网络 Deep Belief Network:深度信念网络 Restricted Boltzmann Machine:受限玻尔兹曼机 Boltzmann Machine:玻尔兹曼机 Hopfield Neural Network:Hopfield神经网络 Energy-based Model:能量模型 主成分分析(Principal Component Analysis, PCA) Autoencoder: 自编码器 稀疏编码(Sparse Coding) 流形学习 偏最小二乘回归(Partial Least Squares, PLS) 前向逐步回归(Stagewise Regression) 局部加权线性回归(Local Weighted Linear Regression) 岭回归与LASSO回归(Ridge/LASSO Regression) Tube回归(Tube Regression) 朴素贝叶斯(Naive Bayes) Recommender System:推荐系统 期望最大算法(Expectation Maximization, EM) 变分推断(Variational Inference) 线性判别分析(Linear Discriminant Analysis, LDA) k近邻算法(k-Nearest Neighbor, kNN) 提升树(Boosting Tree) 梯度提升决策树(Gradient Boosted Decision Tree, GBDT) 随机森林(Random Forest) 决策树(Decision Tree) 集成学习中的提升(Boosting)方法 集成学习中的Bagging(Bootstrap Aggregation)方法 集成学习中的组合(Blending)策略 支持向量回归(Support Vector Regression, SVR) 支持向量机(Support Vector Machine, SVM) 逻辑回归(Logistic Regression) 线性回归(Linear Regression) 感知机(Perceptron) 模型评估方法 模型复杂度理论 机器学习的一些定理 机器学习(Machine Learning)概述
- 「深度学习」 表型图像分析(Phenotypic Image Analysis) 全色锐化(Panchromatic Sharpening) 射频人体感知(RF-based Human Perception) 布局引导图像生成(Layout-to-Image Generation) 开放集合目标检测(Open-Set Object Detection) 目标计数(Object Counting) 点云分类(Point Cloud Classification) 大模型的参数高效微调(Parameter-Efficient Fine-Tuning) 视觉Transformer(Vision Transformer) 度量学习(Metric Learning) 自监督学习(Self-Supervised Learning) 半监督学习(Semi-Supervised Learning) 主动学习(Active Learning) Transformer中的位置编码(Position Encoding) 扩散模型(Diffusion Model) 流模型(Flow-based Model) 变分自编码器(Variational Autoencoder) 生成对抗网络(Generative Adversarial Network) 轻量级(LightWeight)卷积神经网络 多任务学习(Multi-Task Learning) 时空动作检测(Spatio-Temporal Action Detection) 降低Transformer的计算复杂度 卷积神经网络中的池化(Pooling)层 图像长尾分布(Long-Tail Distribution)问题 卷积神经网络的可视化 卷积神经网络中的自注意力机制(Self-Attention Mechanism) 卷积神经网络中的注意力机制(Attention Mechanism) 音乐生成 图像超分辨率(Super Resolution) 对抗训练(Adversarial Training):攻击和防御 连接时序分类 人体姿态估计(Human Pose Estimation) 图像到图像翻译(Image-to-Image Translation) 阅读理解 文本检测与识别(Text Detection and Recognition) 图像描述 文本摘要 行人检测与属性识别(Pedestrian Detection and Attribute Recognition) 人脸检测, 识别与验证(Face Detection, Recognition, and Verification) 目标检测(Object Detection) 图像分割(Image Segmentation) 图像识别(Image Recognition) 网络压缩 词嵌入 深度学习的可解释性 预训练语言模型(Pretrained Language Model) Transformer 自注意力机制(Self-Attention Mechanism) 记忆增强神经网络(Memory Augmented Neural Network) 序列到序列模型中的注意力机制(Attention Mechanism) 序列到序列模型(Sequence to sequence) 胶囊网络 图神经网络(Graph Neural Network) 递归神经网络(Recursive Neural Network) 循环神经网络(Recurrent Neural Network) 卷积神经网络(Convolutional Neural Network) 深度学习中的初始化方法(Initialization) 深度学习中的归一化方法(Normalization) 深度学习中的正则化方法(Regularization) 深度学习中的优化算法(Optimization) 深度学习中的激活函数(Activation Function) 深度学习(Deep Learning)概述
- 「数学」 欧拉路径(Euler Path)与de Bruijn图 样条曲线(Spline Curve) 局部敏感哈希(Locality Sensitive Hashing) 集函数的子模性(Submodularity)与Lovász延拓(Lovász Extension) 二值图像的距离变换(Distance Transform) 积分概率度量(Integral Probability Metric) 利普希茨连续条件(Lipschitz Continuity Condition) 约束优化问题与对偶问题(Dual Problem) 线性规划的对偶理论(Duality Theory) 行列式点过程(Determinantal Point Process) 深度学习中的不确定性(Uncertainty) 琴生不等式(Jenson’s Inequality) 二分图与二分匹配(Bipartite Matching) 最优传输(Optimal Transport)问题与Wasserstein距离 随机变量的变量替换定理(Change of Variable Theorem) 概率分布的重参数化(Reparameterization)技巧 超球面上的von Mises-Fisher(vMF)分布 圆周率(Ratio of Circumference to Diameter)的计算 函数的光滑化(Smoothing) 多目标优化的帕累托最优(Pareto Optimality) 泰勒公式(Taylor Formula) 二进制乘法的Mitchell近似 机器学习中的假设检验(Hypothesis Test) 瑞利商(Rayleigh Quotient)与广义(Generalized)瑞利商 距离度量(Distance Metric)方法 适定(well-posed)问题与不适定(ill-posed)问题 张量分解(Tensor Decomposition) 概率分布之间的散度(Divergence) 参数估计(Parameter Estimation) 抽样分布(Sampling Distribution) 数理统计学(Mathematic Statistics)的基本概念
- 「英语」 40篇短文搞定3500个单词 英语构词法 英语国际音标 英语简史
- 「Python」 微调 Grounding DINO 和 Label Studio 进行半自动化目标检测标注 目标检测数据集的分析 使用einops实现张量操作 使用torch.autograd.grad实现对输入求导 使用pydensecrf构造条件随机场 曲线的平滑处理方法 使用opencv-python(cv2)库进行相机标定 使用torchvision.transforms进行图像增强 使用sympy.solve求解方程 使用scipy.optimize.minimize求解非线性规划 绘制混淆矩阵 MMDetection 用户笔记 Albumentations: 图像的数据增强库 计算模型的参数量(Params)和运算量(FLOPs) Argmax与SoftArgmax 使用json库进行OpenPose输出关节点转换(25→18) Pytorch中的Hook机制 使用Mayavi库进行3D绘图 为RTX3090配置深度学习环境 Pytorch构建自己的数据集 使用tifffile库处理tiff格式图像 (目标检测适用)批量修改xml文件中的name字段 使用Matplotlib绘制训练曲线 ECCV 2020 Tutorial:PyTorch性能调优指南 使用tqdm库绘制进度条 使用numpy.bincount计算混淆矩阵 批量处理文件夹内的图片 处理Matlab中的mat格式文件 LeetCode刷题指南(Python) 数据结构与算法(Python) Python用户笔记
- 「随笔」 宝可梦北京大师赛将出现最强的口袋迷ag! 二十四年磨一剑:毕业小记 浅评《回声》:衍生的衍生,难有回声 浅评《洛基(第二季)》:自由意志与成为神的代价 浅评《惊奇队长2》:门槛更高,失望更快,彩蛋更强 讨论带传动的拉力关系 立方体电阻的等效问题 存储文件会增加手机的质量吗? 彩虹的尽头是什么? “飞蛾扑火”问题研究
- 「论文阅读」 DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding DODA: Diffusion for Object-detection Domain Adaptation in Agriculture CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale Adapting the Segment Anything Model for Plant Recognition and Automated Phenotypic Parameter Measurement Adapting Vision Foundation Models for Plant Phenotyping Pan-Mamba: Effective pan-sharpening with State Space Model PanFlowNet: A Flow-Based Deep Network for Pan-sharpening Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion Deep Gradient Projection Networks for Pan-sharpening Mutual Information-driven Pan-sharpening Spatial-Frequency Domain Information Integration for Pan-Sharpening PanFormer: a Transformer Based Model for Pan-sharpening Pan-Sharpening with Customized Transformer and Invertible Neural Network Super-Resolution-Guided Progressive Pansharpening based on a Deep Convolutional Neural Network A Multi-Scale and Multi-Depth Convolutional Neural Network for Remote Sensing Imagery Pan-Sharpening PanNet: A Deep Network Architecture for Pan-Sharpening Pansharpening by Convolutional Neural Networks A universal SNP and small-indel variant caller using deep neural networks LoRA-GA: Low-Rank Adaptation with Gradient Approximation Recovering Human Pose and Shape From Through-the-Wall Radar Images Unsupervised Human Contour Extraction From Through-Wall Radar Images Using Dual UNet Through-Wall Human Pose Reconstruction Based on Cross-Modal Learning and Self-Supervised Learning Through-Wall Human Pose Estimation by Mutual Information Maximizing Deeply Supervised Nets RadarFormer: End-to-End Human Perception With Through-Wall Radar and Transformers YOLOv10: Real-Time End-to-End Object Detection Learning Spatial Similarity Distribution for Few-shot Object Counting KAN: Kolmogorov-Arnold Networks DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation OmniCount: Multi-label Object Counting with Semantic-Geometric Priors InstanceDiffusion: Instance-level Control for Image Generation LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation Adding Conditional Control to Text-to-Image Diffusion Models ReCo: Region-Controlled Text-to-Image Generation LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation GLIGEN: Open-Set Grounded Text-to-Image Generation YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information LoRA+: Efficient Low Rank Adaptation of Large Models Video generation models as world simulators Enhancing Zero-shot Counting via Language-guided Exemplar Learning VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting Semantic Generative Augmentations for Few-Shot Counting DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection Learning Object-Language Alignments for Open-Vocabulary Object Detection RegionCLIP: Region-based Language-Image Pretraining Exploiting Unlabeled Data with Vision and Language Models for Object Detection Detecting Twenty-thousand Classes using Image-level Supervision Open Vocabulary Object Detection with Pseudo Bounding-Box Labels Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Open-Vocabulary Object Detection Using Captions MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Towards Open-Set Object Detection and Discovery Grounded Language-Image Pre-training Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection Finite Scalar Quantization: VQ-VAE Made Simple ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Open-world Text-specified Object Counting One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning Faster sorting algorithms discovered using deep reinforcement learning Towards Partial Supervision for Generic Object Counting in Natural Scenes Object Counting and Instance Segmentation with Image-level Supervision Class-aware Object Counting MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond Dilated-Scale-Aware Attention ConvNet For Multi-Class Object Counting QLoRA: Efficient Finetuning of Quantized LLMs Zero-shot Improvement of Object Counting with CLIP Teaching CLIP to Count to Ten Can SAM Count Anything? An Empirical Study on SAM Counting Vicinal Counting Networks Class-agnostic Few-shot Object Counting GCNet: Probing Self-Similarity Learning for Generalized Counting Network Mimetic Initialization of Self-Attention Layers Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting Zero-shot Object Counting Scale-Prior Deformable Convolution for Exemplar-Guided Class-Agnostic Counting CLIP-Count: Towards Text-Guided Zero-Shot Object Counting A Low-Shot Object Counting Network With Iterative Prototype Adaptation CounTR: Transformer-based Generalised Visual Counting Exemplar Free Class Agnostic Counting Few-shot Object Counting and Detection Learning to Count Anything: Reference-less Class-agnostic Counting with Weak Supervision Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting Few-shot Object Counting with Similarity-Aware Feature Enhancement Object Counting: You Only Need to Look at One Learning To Count Everything Class-Agnostic Counting PCT: Point cloud transformer PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Dynamic Graph CNN for Learning on Point Clouds PointCNN: Convolution On X-Transformed Points Segment Anything PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation OctNet: Learning Deep 3D Representations at High Resolutions VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition Multi-view Convolutional Neural Networks for 3D Shape Recognition Human Pose as Compositional Tokens Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning DeepMIM: Deep Supervision for Masked Image Modeling Masked Image Modeling with Local Multi-Scale Reconstruction Symbolic Discovery of Optimization Algorithms Visual Prompt Tuning AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning Towards a Unified View of Parameter-Efficient Transfer Learning Parameter-Efficient Transfer Learning with Diff Pruning DensePose From WiFi LoRA: Low-Rank Adaptation of Large Language Models AdapterDrop: On the Efficiency of Adapters in Transformers AdapterFusion: Non-Destructive Task Composition for Transfer Learning P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks GPT Understands, Too The Power of Scale for Parameter-Efficient Prompt Tuning Prefix-Tuning: Optimizing Continuous Prompts for Generation BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models Parameter-Efficient Transfer Learning for NLP Ultralytics YOLOv8 Refiner: Refining Self-attention for Vision Transformers Improve Vision Transformers Training by Suppressing Over-smoothing Twins: Revisiting the Design of Spatial Attention in Vision Transformers All Tokens Matter: Token Labeling for Training Better Vision Transformers Incorporating Convolution Designs into Visual Transformers CvT: Introducing Convolutions to Vision Transformers Per-Pixel Classification is Not All You Need for Semantic Segmentation Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions Segmenter: Transformer for Semantic Segmentation Rethinking Spatial Dimensions of Vision Transformers SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification Do We Really Need Explicit Position Encodings for Vision Transformers? Visual Transformers: Token-based Image Representation and Processing for Computer Vision LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference Escaping the Big Data Paradigm with Compact Transformers Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Going deeper with Image Transformers ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders DeepViT: Towards Deeper Vision Transformer Training data-efficient image transformers & distillation through attention Better plain ViT baselines for ImageNet-1k Position Prediction as an Effective Pretraining Strategy Pooling Revisited: Your Receptive Field is Suboptimal A ConvNet for the 2020s Deformable Convolutional Networks Deformable ConvNets v2: More Deformable, Better Results CondConv: Conditionally Parameterized Convolutions for Efficient Inference Dynamic Convolution: Attention over Convolution Kernels DyNet: Dynamic Convolution for Accelerating Convolutional Neural Networks Omni-Dimensional Dynamic Convolution Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution Dynamic Region-Aware Convolution An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution Inception Convolution with Efficient Dilation Search Heatmap Distribution Matching for Human Pose Estimation AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time RTMDet: An Empirical Study of Designing Real-Time Object Detectors Amos: An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation Parametric Instance Classification for Unsupervised Visual Feature Learning Self-Supervised Learning based on Heat Equation Revealing the Dark Secrets of Masked Image Modeling On Data Scaling in Masked Image Modeling ConvMAE: Masked Convolution Meets Masked Autoencoders SimMIM: A Simple Framework for Masked Image Modeling iBOT: Image BERT Pre-Training with Online Tokenizer BEiT: BERT Pre-Training of Image Transformers Discovering faster matrix multiplication algorithms with reinforcement learning Analyzing and Improving Representations with the Soft Nearest Neighbor Loss Circle Loss: A Unified Perspective of Pair Similarity Optimization Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning Person re-identification by multi-channel parts-based CNN with improved triplet loss function In Defense of the Triplet Loss for Person Re-Identification Ranked List Loss for Deep Metric Learning ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis Proxy Anchor Loss for Deep Metric Learning Deep Metric Learning for Practical Person Re-Identification No Fuss Distance Metric Learning using Proxies Deep Metric Learning with Hierarchical Triplet Loss Deep Metric Learning via Facility Location Beyond triplet loss: a deep quadruplet network for person re-identification Deep Metric Learning with Angular Loss Metric Learning with Adaptive Density Discrimination Improved Deep Metric Learning with Multi-class N-pair Loss Objective Learning Deep Embeddings with Histogram Loss Deep Metric Learning via Lifted Structured Feature Embedding FaceNet: A Unified Embedding for Face Recognition and Clustering Jigsaw Clustering for Unsupervised Visual Representation Learning Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning Emerging Properties in Self-Supervised Vision Transformers Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations Evolving Losses for Unsupervised Video Representation Learning Exploring Simple Siamese Representation Learning CURL: Contrastive Unsupervised Representations for Reinforcement Learning Unsupervised Learning of Visual Features by Contrasting Cluster Assignments An Empirical Study of Training Self-Supervised Vision Transformers Improved Baselines with Momentum Contrastive Learning Momentum Contrast for Unsupervised Visual Representation Learning Contrastive Multiview Coding Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination Unsupervised Embedding Learning via Invariant and Spreading Instance Feature Bootstrap your own latent: A new approach to self-supervised Learning Barlow Twins: Self-Supervised Learning via Redundancy Reduction A Simple Framework for Contrastive Learning of Visual Representations Contrastive Learning with Hard Negative Samples Debiased Contrastive Learning Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere Data-Efficient Image Recognition with Contrastive Predictive Coding Representation Learning with Contrastive Predictive Coding Colorful Image Colorization Representation Learning by Learning to Count Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles Unsupervised Visual Representation Learning by Context Prediction Unsupervised Representation Learning by Predicting Image Rotations Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow Big Self-Supervised Models are Strong Semi-Supervised Learners FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence DivideMix: Learning with Noisy Labels as Semi-supervised Learning ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring MixMatch: A Holistic Approach to Semi-Supervised Learning Meta Pseudo Labels Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning Label Propagation for Deep Semi-supervised Learning Unsupervised Data Augmentation for Consistency Training Interpolation Consistency Training for Semi-Supervised Learning Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results Temporal Ensembling for Semi-Supervised Learning Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning Deep Active Learning: Unified and Principled Method for Query and Training Discriminative Active Learning BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning Batch Active Learning Using Determinantal Point Processes Batch Active Learning at Scale Deep Similarity-Based Batch Mode Active Learning with Exploration-Exploitation Cost-Effective Active Learning for Deep Image Classification Diverse mini-batch Active Learning Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation Bayesian Generative Active Deep Learning Generative Adversarial Active Learning Adversarial Active Learning for Deep Networks: a Margin Based Approach When Deep Learners Change Their Mind: Learning Dynamics for Active Learning Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds Active Learning for Convolutional Neural Networks: A Core-Set Approach Active Learning by Acquiring Contrastive Examples Minimax Active Learning Weight Uncertainty in Neural Networks Learning Loss for Active Learning Deep Bayesian Active Learning with Image Data SIoU Loss: More Powerful Learning for Bounding Box Regression YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Comprehensive Guide to Ultralytics YOLOv5 ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions RoFormer: Enhanced Transformer with Rotary Position Embedding Your Transformer May Not be as Powerful as You Expect Encoding word order in complex embeddings Self-Attention with Relative Position Representations Learning to Encode Position for Transformer with Continuous Dynamical Model Dual Contrastive Learning for Unsupervised Image-to-Image Translation Making the Invisible Visible: Action Recognition Through Walls and Occlusions Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Hierarchical Text-Conditional Image Generation with CLIP Latents Learning Longterm Representations for Person Re-Identification Using Radio Signals Variational Diffusion Models Poisson Flow Generative Models Noether Networks: Meta-Learning Useful Conserved Quantities High-Resolution Image Synthesis with Latent Diffusion Models Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks Alias-Free Generative Adversarial Networks Semi-Supervised Learning with Generative Adversarial Networks ClusterGAN : Latent Space Clustering in Generative Adversarial Networks Deep Symbolic Regression for Recurrent Sequences Training Generative Adversarial Networks with Limited Data Classifier-Free Diffusion Guidance More Control for Free! Image Synthesis with Semantic Diffusion Guidance Diffusion Models Beat GANs on Image Synthesis Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models Score-Based Generative Modeling through Stochastic Differential Equations Denoising Diffusion Implicit Models Improved Denoising Diffusion Probabilistic Models Denoising Diffusion Probabilistic Models Analyzing and Improving the Image Quality of StyleGAN A Style-Based Generator Architecture for Generative Adversarial Networks Do As I Can, Not As I Say: Grounding Language in Robotic Affordances On Self Modulation for Generative Adversarial Networks Large Scale GAN Training for High Fidelity Natural Image Synthesis Taming Transformers for High-Resolution Image Synthesis Self-Attention Generative Adversarial Networks cGANs with Projection Discriminator StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks SinGAN: Learning a Generative Model from a Single Natural Image Progressive Growing of GANs for Improved Quality, Stability, and Variation Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks Context Encoders: Feature Learning by Inpainting Semantic Image Synthesis with Spatially-Adaptive Normalization Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation Softmax GAN On Convergence and Stability of GANs Invertible Residual Networks Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design Improving Variational Inference with Inverse Autoregressive Flow Contrastive Learning for Unpaired Image-to-Image Translation TFPose: Direct Human Pose Estimation with Transformers Masked Autoregressive Flow for Density Estimation Variational Inference with Normalizing Flows Unsupervised Learning for Human Sensing Using Radio Signals Glow: Generative Flow with Invertible 1x1 Convolutions Density estimation using Real NVP NICE: Non-linear Independent Components Estimation GANILLA: Generative Adversarial Networks for Image to Illustration Translation Rethinking the Truly Unsupervised Image-to-Image Translation High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network Multimodal Unsupervised Image-to-Image Translation Simple yet Effective Way for Improving the Performance of GAN f-VAEs: Improve VAEs with Conditional Flows Temporal Difference Variational Auto-Encoder NVAE: A Deep Hierarchical Variational Autoencoder Hyperspherical Variational Auto-Encoders A Batch Normalized Inference Network Keeps the KL Vanishing Away Variational Inference of Disentangled Latent Concepts from Unlabeled Observations Structured Disentangled Representations Disentangling by Factorising Sliced-Wasserstein Autoencoder: An Embarrassingly Simple Generative Model Log Hyperbolic Cosine Loss Improves Variational Auto-Encoder Learning to Generate Images with Perceptual Similarity Metrics Learning Disentangled Joint Continuous and Discrete Representations Categorical Reparameterization with Gumbel-Softmax Deep Feature Consistent Variational Autoencoder Tighter Variational Bounds are Not Necessarily Better Importance Weighted Autoencoders Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB Isolating Sources of Disentanglement in Variational Autoencoders Wasserstein Auto-Encoders Variational methods for Conditional Multimodal Learning: Generating Human Faces from Attributes Learning Structured Output Representation using Deep Conditional Generative Models Exploring and Exploiting Hubness Priors for High-Quality GAN Latent Sampling Demystifying MMD GANs Fisher GAN MMD GAN: Towards Deeper Understanding of Moment Matching Network Generative Moment Matching Networks McGan: Mean and Covariance Feature Matching GAN A Note on the Inception Score GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium Which Training Methods for GANs do actually Converge? Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Unsupervised Image-to-Image Translation Networks StarGAN v2: Diverse Image Synthesis for Multiple Domains StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation Toward Multimodal Image-to-Image Translation DualGAN: Unsupervised Dual Learning for Image-to-Image Translation Learning to Discover Cross-Domain Relations with Generative Adversarial Networks Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks Competition-Level Code Generation with AlphaCode Evaluating Large Language Models Trained on Code Self-Correction for Human Parsing Image-to-Image Translation with Conditional Adversarial Networks Gradientless Descent: High-Dimensional Zeroth-Order Optimization Coupled Generative Adversarial Networks Designing GANs: A Likelihood Ratio Approach InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets Transformer Quality in Linear Time Step-size Adaptation Using Exponentiated Gradient Updates Conditional Image Synthesis With Auxiliary Classifier GANs Conditional Generative Adversarial Nets Boundary-Seeking Generative Adversarial Networks BEGAN: Boundary Equilibrium Generative Adversarial Networks Efficient Through-wall Human Pose Reconstruction Using UWB MIMO Radar Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities MAGAN: Margin Adaptation for Generative Adversarial Networks Maximum Entropy Generators for Energy-Based Models GAN-QP: A Novel GAN Framework without Gradient Vanishing and Lipschitz Constraint The relativistic discriminator: a key element missing from standard GAN Adversarial Autoencoders Gradients without Backpropagation Adversarial Feature Learning Autoencoding beyond pixels using a learned similarity metric Energy-based Generative Adversarial Network Least Squares Generative Adversarial Networks Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks Wasserstein GANs Work Because They Fail (to Approximate the Wasserstein Distance) How Well Do WGANs Estimate the Wasserstein Metric? GraN-GAN: Piecewise Gradient Normalization for Generative Adversarial Networks Gradient Normalization for Generative Adversarial Networks Wasserstein Divergence for GANs Spectral Normalization for Generative Adversarial Networks f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization Improved Training of Wasserstein GANs Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Wasserstein GAN Towards Principled Methods for Training Generative Adversarial Networks Improved Techniques for Training GANs Advancing mathematics by guiding human intuition with AI Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Masked Autoencoders Are Scalable Vision Learners Variational Adversarial Active Learning Robust and Generalizable Visual Representation Learning via Random Convolutions TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation RandAugment: Practical automated data augmentation with a reduced search space AutoAugment: Learning Augmentation Policies from Data Random Erasing Data Augmentation Squareplus: A Softplus-Like Algebraic Rectifier Activate or Not: Learning Customized Activation SMU: smooth activation function for deep networks using smoothing maximum technique MicroNet: Towards Image Recognition with Extremely Low FLOPs GhostNet: More Features from Cheap Operations SAU: Smooth activation function using convolution with approximate identities ZerO Initialization: Initializing Neural Networks with only Zeros and Ones Dynamic ReLU Learning Activation Functions to Improve Deep Neural Networks Orthogonal-Padé Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks Maxout Networks Learning specialized activation functions with the Piecewise Linear Unit TOOD: Task-aligned One-stage Object Detection Unifying Nonlocal Blocks for Neural Networks Attention Augmented Convolutional Networks Dynamic Task Prioritization for Multitask Learning Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Region-based Non-local Operation for Video Classification Exploring Self-attention for Image Recognition Image Super-Resolution with Non-Local Sparse Attention Polarized Self-Attention: Towards High-quality Pixel-wise Regression DMSANet: Dual Multi Scale Attention Network SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks Residual Attention: A Simple but Effective Method for Multi-Label Recognition Sluice networks: Learning what to share between loosely related tasks Cross-stitch Networks for Multi-task Learning Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification Learning Multiple Tasks with Multilinear Relationship Networks IGCV2: Interleaved Structured Sparse Convolutional Neural Networks Interleaved Group Convolutions for Deep Neural Networks ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices SqueezeNext: Hardware-Aware Neural Network Design SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size Searching for MobileNetV3 MobileNetV2: Inverted Residuals and Linear Bottlenecks MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications EfficientNetV2: Smaller Models and Faster Training EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks Loss-Balanced Task Weighting to Reduce Negative Transfer in Multi-Task Learning End-to-End Multi-Task Learning with Attention Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics Searching for Activation Functions The Quest for the Golden Activation Function Self-Normalizing Neural Networks Empirical Evaluation of Rectified Activations in Convolutional Network Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Rectifier Nonlinearities Improve Neural Network Acoustic Models Training Deeper Convolutional Networks with Deep Supervision Deeply-Supervised Nets Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units Continuously Differentiable Exponential Linear Units Mish: A Self Regularized Non-Monotonic Activation Function XLNet: Generalized Autoregressive Pretraining for Language Understanding MASS: Masked Sequence to Sequence Pre-training for Language Generation Unified Language Model Pre-training for Natural Language Understanding and Generation RoBERTa: A Robustly Optimized BERT Pretraining Approach Efficient Attention: Attention with Linear Complexities Longformer: The Long-Document Transformer Linformer: Self-Attention with Linear Complexity Rethinking Attention with Performers Reformer: The Efficient Transformer Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks ResMLP: Feedforward networks for image classification with data-efficient training Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet MLP-Mixer: An all-MLP Architecture for Vision Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE CompConv: A Compact Convolution Module for Efficient Feature Learning Integrating Circle Kernels into Convolutional Neural Networks YOLOX: Exceeding YOLO Series in 2021 BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition Actions as Moving Points MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions Equalization Loss v2: A New Gradient Balance Approach for Long-tailed Object Detection Generating Long Sequences with Sparse Transformers R-Drop: Regularized Dropout for Neural Networks Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers Addressing Some Limitations of Transformers with Feedback Memory Language Models are Open Knowledge Graphs Supermasks in Superposition UNet++: A Nested U-Net Architecture for Medical Image Segmentation Fourier Neural Operator for Parametric Partial Differential Equations Gradient Centralization: A New Optimization Technique for Deep Neural Networks PeCLR: Self-Supervised 3D Hand Pose Estimation from monocular RGB via Equivariant Contrastive Learning GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers Learning Spatial Fusion for Single-Shot Object Detection Region Proposal by Guided Anchoring Gradient Harmonized Single-stage Detector HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax Deformable DETR: Deformable Transformers for End-to-End Object Detection Segmented convolutional gated recurrent neural networks for human activity recognition in ultra-wideband radar Human Motion Recognition With Limited Radar Micro-Doppler Signatures Object Detection from Video Tubelets with Convolutional Neural Networks Action Tubelet Detector for Spatio-Temporal Action Localization Y-Net: Joint Segmentation and Classification for Diagnosis of Breast Biopsy Images W-Net: A Deep Model for Fully Unsupervised Image Segmentation M-Net: A Convolutional Neural Network for Deep Brain Structure Segmentation V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation Human motion recognition exploiting radar with stacked recurrent neural network Extracting Training Data from Large Language Models EfficientDet: Scalable and Efficient Object Detection Learning from Noisy Anchors for One-stage Object Detection Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3 Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training Sparse R-CNN: End-to-End Object Detection with Learnable Proposals RepPoints: Point Set Representation for Object Detection AutoAssign: Differentiable Label Assignment for Dense Object Detection Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection VarifocalNet: An IoU-aware Dense Object Detector Soft Anchor-Point Object Detection Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection Libra R-CNN: Towards Balanced Learning for Object Detection Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Unsupervised Adversarial Domain Adaptation for Micro-Doppler Based Human Activity Classification Unsupervised Domain Adaptation for Micro-Doppler Human Motion Classification via Feature Fusion Through-Wall Human Motion Recognition Based on Transfer Learning and Ensemble Learning Cross-regional oil palm tree counting and detection via a multi-level attention domain adaptation network Cross-Regional Oil Palm Tree Detection Radar-Based Human Activity Recognition With 1-D Dense Attention Network Lite-HRNet: A Lightweight High-Resolution Network Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild Numerical Coordinate Regression with Convolutional Neural Networks Removing the Bias of Integral Pose Regression Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the Wild Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention SimCC: a Simple Coordinate Classification Perspective for Human Pose Estimation TokenPose: Learning Keypoint Tokens for Human Pose Estimation Online Knowledge Distillation for Efficient Pose Estimation Integral Human Pose Regression Distribution-Aware Coordinate Representation for Human Pose Estimation The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation AID: Pushing the Performance Boundary of Human Pose Estimation with Information Dropping Augmentation High-Performance Large-Scale Image Recognition Without Normalization PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation 3D Human Pose Estimation = 2D Pose Estimation + Matching Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network Deep High-Resolution Representation Learning for Human Pose Estimation Rethinking on Multi-Stage Networks for Human Pose Estimation Cascaded Pyramid Network for Multi-Person Pose Estimation DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation Associative Embedding: End-to-End Learning for Joint Detection and Grouping RMPE: Regional Multi-person Pose Estimation Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields Learning Feature Pyramids for Human Pose Estimation Multi-Context Attention for Human Pose Estimation Chained Predictions Using Convolutional Neural Networks Convolutional Pose Machines Stacked Hourglass Networks for Human Pose Estimation DeBERTa: Decoding-enhanced BERT with Disentangled Attention DeepPose: Human Pose Estimation via Deep Neural Networks FCOS: A Simple and Strong Anchor-free Object Detector AMASS: Archive of Motion Capture as Surface Shapes SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation Object-Contextual Representations for Semantic Segmentation Boundary loss for highly unbalanced segmentation Objects as Points Focal Loss for Dense Object Detection SSD: Single Shot MultiBox Detector YOLOv3: An Incremental Improvement Revisiting ResNets: Improved Training and Scaling Strategies YOLO9000: Better, Faster, Stronger You Only Look Once: Unified, Real-Time Object Detection KeepAugment: A Simple Information-Preserving Data Augmentation Approach Involution: Inverting the Inherence of Convolution for Visual Recognition Rethinking the Inception Architecture for Computer Vision Cascade R-CNN: Delving into High Quality Object Detection Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Transformer in Transformer Fast R-CNN Coordinate Attention for Efficient Mobile Network Design Towards 3D Human Pose Construction Using WiFi Person-in-WiFi: Fine-grained Person Perception using WiFi TransGAN: Two Transformers Can Make One Strong GAN Rich feature hierarchies for accurate object detection and semantic segmentation Unified Perceptual Parsing for Scene Understanding mm-Pose: Real-Time Human Skeletal Posture Estimation using mmWave Radars and CNNs PSANet: Point-wise Spatial Attention Network for Scene Parsing Through-Wall Human Pose Reconstruction via UWB MIMO Radar and 3D CNN Adaptive Pyramid Context Network for Semantic Segmentation Dynamic Multi-Scale Filters for Semantic Segmentation DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation Context Encoding for Semantic Segmentation Attention U-Net: Learning Where to Look for the Pancreas RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation Pyramid Scene Parsing Network Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation Rethinking Atrous Convolution for Semantic Image Segmentation DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs U-Net: Convolutional Networks for Biomedical Image Segmentation RF-Based 3D Skeletons SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation A Survey on Visual Transformer Pre-Trained Image Processing Transformer Fully Convolutional Networks for Semantic Segmentation Augmentation for small object detection A Survey of Handy See-Through Wall Technology RepVGG: Making VGG-style ConvNets Great Again Image Transformer Through-Wall Pose Imaging in Real-Time with a Many-to-Many Encoder/Decoder Paradigm Expressive Body Capture: 3D Hands, Face, and Body from a Single Image Bottleneck Transformers for Visual Recognition SA-Net: Shuffle Attention for Deep Convolutional Neural Networks 3D Imaging of Moving Targets for Ultra-wideband MIMO Through-wall Radar System Panoptic Feature Pyramid Networks BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation GRUU-Net: Integrated convolutional and gated recurrent neural network for cell segmentation PointRend: Image Segmentation as Rendering K-Net: Towards Unified Image Segmentation Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks Balanced Meta-Softmax for Long-Tailed Visual Recognition Seesaw Loss for Long-Tailed Instance Segmentation Equalization Loss for Long-Tailed Object Recognition Class-Balanced Loss Based on Effective Number of Samples Decoupling Representation and Classifier for Long-Tailed Recognition ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image Language Models are Unsupervised Multitask Learners mT5: A massively multilingual pre-trained text-to-text transformer GLU Variants Improve Transformer Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer SMPL: A Skinned Multi-Person Linear Model Learning Transferable Visual Models From Natural Language Supervision Long-tail learning via logit adjustment On the Relationship between Self-Attention and Convolutional Layers Improving Language Understanding by Generative Pre-Training BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Deep contextualized word representations Deformable DETR: Deformable Transformers for End-to-End Object Detection An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Generative Pretraining from Pixels Do We Need Zero Training Loss After Achieving Zero Training Error? REALM: Retrieval-Augmented Language Model Pre-Training OneNet: Towards End-to-End One-Stage Object Detection Implicit Gradient Regularization Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Multimodal Machine Learning: A Survey and Taxonomy Learning Continuous Image Representation with Local Implicit Image Function AdaX: Adaptive Gradient Descent with Exponential Long Term Memory Adafactor: Adaptive Learning Rates with Sublinear Memory Cost Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation Large Batch Optimization for Deep Learning: Training BERT in 76 minutes Large Batch Training of Convolutional Networks Lookahead Optimizer: k steps forward, 1 step back On the Variance of the Adaptive Learning Rate and Beyond Incorporating Nesterov Momentum into Adam On the Convergence of Adam and Beyond Adam: A Method for Stochastic Optimization On the importance of initialization and momentum in deep learning A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm ADADELTA: An Adaptive Learning Rate Method Don’t Decay the Learning Rate, Increase the Batch Size InfoVAE: Balancing Learning and Inference in Variational Autoencoders Understanding disentangling in β-VAE β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework Attentional Feature Fusion Memory-Efficient Adaptive Optimization Averaging Weights Leads to Wider Optima and Better Generalization Decoupled Weight Decay Regularization ULSAM: Ultra-Lightweight Subspace Attention Module for Compact Convolutional Neural Networks A^2-Nets: Double Attention Networks Neural Architecture Search for Lightweight Non-Local Networks Interlaced Sparse Self-Attention for Semantic Segmentation Through-Wall Human Mesh Recovery Using Radio Signals A simple yet effective baseline for 3d human pose estimation Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods Asymmetric Non-local Neural Networks for Semantic Segmentation Expectation-Maximization Attention Networks for Semantic Segmentation Dual Attention Network for Scene Segmentation Generating Diverse High-Fidelity Images with VQ-VAE-2 Neural Discrete Representation Learning CCNet: Criss-Cross Attention for Semantic Segmentation GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond Non-Local Neural Networks Through-Wall Human Pose Estimation Using Radio Signals Competitive Inner-Imaging Squeeze and Excitation for Residual Network SRM: A Style-based Recalibration Module for Convolutional Neural Networks Tiled Squeeze-and-Excite: Channel Attention With Local Spatial Context An Attention Module for Convolutional Neural Networks NAM: Normalization-based Attention Module Residual Attention Network for Image Classification Attention as Activation Interflow: Aggregating Multi-layer Feature Mappings with Attention Mechanism Spanet: Spatial Pyramid Attention Network for Enhanced Image Recognition EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network End-to-End Adversarial Text-to-Speech Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery BA^2M: A Batch Aware Attention Module for Image Classification Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network DIANet: Dense-and-Implicit Attention Network On the Measure of Intelligence DCANet: Learning Connected Attentions for Convolutional Neural Networks Rotate to Attend: Convolutional Triplet Attention Module Improving Convolutional Networks with Self-calibrated Convolutions You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery The Hardware Lottery Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks FcaNet: Frequency Channel Attention Networks Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks CBAM: Convolutional Block Attention Module BAM: Bottleneck Attention Module Global Second-order Pooling Convolutional Networks Selective Kernel Networks Squeeze-and-Excitation Networks Self-Orthogonality Module: A Network Architecture Plug-in for Learning Orthogonal Filters Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning Why gradient clipping accelerates training: A theoretical justification for adaptivity Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow AdderNet: Do We Really Need Multiplications in Deep Learning? Deep Variational Information Bottleneck Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations The Geometric Occam’s Razor Implicit in Deep Learning Implicit Gradient Regularization Spectral Norm Regularization for Improving the Generalizability of Deep Learning Understanding the Role of Individual Units in a Deep Neural Network High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks Rethinking Pre-training and Self-training Rethinking ImageNet Pre-training Unsupervised Translation of Programming Languages Neural Architecture Search without Training ResNeSt: Split-Attention Networks Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution Selective Kernel Networks DropBlock: A regularization method for convolutional networks Funnel Activation for Visual Recognition Learning in the Frequency Domain Simple Regret Minimization for Contextual Bandits Learning Sparse Neural Networks through L0 Regularization Closed-loop Matters: Dual Regression Networks for Single Image Super-Resolution ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network Second-order Attention Network for Single Image Super-Resolution Big Bird: Transformers for Longer Sequences Self-training with Noisy Student improves ImageNet classification Enhanced Deep Residual Networks for Single Image Super-Resolution Accurate Image Super-Resolution Using Very Deep Convolutional Networks Accelerating the Super-Resolution Convolutional Neural Network Image Super-Resolution Using Deep Convolutional Networks Deep Back-Projection Networks For Super-Resolution Image Super-Resolution Using Very Deep Residual Channel Attention Networks CornerNet: Detecting Objects as Paired Keypoints Movement Pruning: Adaptive Sparsity by Fine-Tuning SCAN: Learning to Classify Images without Labels Synthesizer: Rethinking Self-Attention in Transformer Models Language Models are Few-Shot Learners Deep Ensembles: A Loss Landscape Perspective When BERT Plays the Lottery, All Tickets Are Winning Deep image reconstruction from human brain activity Investigating Human Priors for Playing Video Games Meta-Learning with Implicit Gradients A critical analysis of self-supervision, or what we can learn from a single image Faster Neural Network Training with Data Echoing Concept Learning with Energy-Based Models Big Transfer (BiT): General Visual Representation Learning Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning Reinforcement Learning with Augmented Data TAPAS: Weakly Supervised Table Parsing via Pre-training Jukebox: A Generative Model for Music Do ImageNet Classifiers Generalize to ImageNet? Group Normalization Weight Standardization mixup: Beyond Empirical Risk Minimization DETR:End-to-End Object Detection with Transformers YOLOv4: Optimal Speed and Accuracy of Object Detection Recent Advances in Deep Learning for Object Detection DeepFace: Closing the Gap to Human-Level Performance in Face Verification Convolutional Neural Networks for Sentence Classification MMDetection: Open MMLab Detection Toolbox and Benchmark Bag of Tricks for Image Classification with Convolutional Neural Networks
- 「游记」 (浙江篇)安吉:绿水青山就是金山银山 (日本篇)富士:谁能凭爱意要富士山私有 (日本篇)镰仓:幕府、江岛与灌篮 (河北篇)唐山:唐山很“唐” (辽宁篇)沈阳:龙行龘龘,沈水之阳 (河北篇)保定:推开京畿之门 (英国篇)伯克郡:温莎古堡探险记 (英国篇)格林尼治:计时与经度之始 (英国篇)牛津郡:学府与古迹