yolov26改进 | 注意力机制篇 | RCS-OSA二次创新C2PSA和助力YOLOv26有效张点(全网独家创新,附独家网络结构图)
开始讲解之前推荐一下我的专栏本专栏的内容支持(分类、检测、分割、追踪、关键点检测),专栏目前为限时折扣欢迎大家订阅本专栏本专栏每周更新5-7篇最新机制更有包含我所有改进的文件和交流群提供给大家本人定期在群内分享发表论文方法和经验。一、本文介绍本文给大家带来的改进机制是RCS-YOLO提出的RCS-OSA模块其全称是Reduced Channel Spatial Object Attention意即减少通道的空间对象注意力。这个模块的主要功能是通过减少特征图的通道数量同时关注空间维度上的重要特征来提高模型的处理效率和检测精度。亲测在小目标检测和大尺度目标检测的数据集上都有大幅度的涨点效果(mAP直接涨了大概有0.06左右)。同时本文对RCS-OSA模块的框架原理进行了详细的分析更有本人绘制的网络结构图帮助大家理解独家首发不光让大家会添加到自己的模型在写论文的时候也能够有一定的参照最后本文会手把手教你添加RCS-OSA以及C3k2-RCSOSA和C2PSA-OSA模块到网络结构中。专栏链接YOLOv26有效涨点专栏包含Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制目录一、本文介绍二、RCS-OSA模块原理2.1 RCS-OSA的基本原理2.2 RCS2.3 RCS模块2.4 OSA2.5 特征级联三、RCS-OSA核心代码四、手把手教你添加RCS-OSA模块4.1 修改一4.2 修改二4.3 修改三4.4 修改四4.5 修改五4.6 修改六五、正式训练5.1 yaml文件5.1.1 yaml文件15.1.2 yaml文件25.2 训练代码5.3 训练过程截图五、本文总结二、RCS-OSA模块原理论文地址官方论文地址代码地址官方代码地址2.1 RCS-OSA的基本原理RCSOSARCS-One-Shot Aggregation是RCS-YOLO中提出的一种结构我们可以将主要原理概括如下1. RCSReparameterized Convolution based on channel Shuffle: 结合了通道混洗通过重参数化卷积来增强网络的特征提取能力。2. RCS模块:在训练阶段利用多分支结构学习丰富的特征表示在推理阶段通过结构化重参数化简化为单一分支减少内存消耗。3. OSAOne-Shot Aggregation:一次性聚合多个特征级联减少网络计算负担提高计算效率。4. 特征级联:RCS-OSA模块通过堆叠RCS确保特征的复用并加强不同层之间的信息流动。2.2RCSRCS基于通道Shuffle的重参数化卷积是RCS-YOLO的核心组成部分旨在训练阶段通过多分支结构学习丰富的特征信息并在推理阶段通过简化为单分支结构来减少内存消耗实现快速推理。此外RCS利用通道分割和通道Shuffle操作来降低计算复杂性同时保持通道间的信息交换这样在推理阶段相比普通的3×3卷积可以减少一半的计算复杂度。通过结构重参数化RCS能够在训练阶段从输入特征中学习深层表示并在推理阶段实现快速推理同时减少内存消耗。2.3RCS模块RCS基于通道Shuffle的重参数化卷积模块中结构在训练阶段使用多个分支包括1x1和3x3的卷积以及一个直接的连接Identity用于学习丰富的特征表示。在推理阶段结构被重参数化成一个单一的3x3卷积以减少计算复杂性和内存消耗同时保持训练阶段学到的特征表达能力。这与RCS的设计理念紧密相连即在不牺牲性能的情况下提高计算效率。上图为大家展示了RCS的结构分为训练阶段a部分和推理阶段b部分。在训练阶段输入通过通道分割一部分输入经过RepVGG块另一部分保持不变。然后通过1x1卷积和3x3卷积处理RepVGG块的输出与另一部分输入进行通道Shuffle和连接。在推理阶段原来的多分支结构被简化为一个单一的3x3 RepConv块。这种设计允许在训练时学习复杂特征在推理时减少计算复杂度。黑色边框的矩形代表特定的模块操作渐变色的矩形代表张量的特定特征矩形的宽度代表张量的通道数。2.4OSAOSAOne-Shot Aggregation是一个关键的模块旨在提高网络在处理密集连接时的效率。OSA模块通过表示具有多个感受野的多样化特征并在最后的特征映射中仅聚合一次所有特征从而克服了DenseNet中密集连接的低效率问题。OSA模块的使用有两个主要目的1. 提高特征表示的多样性OSA通过聚合具有不同感受野的特征来增加网络对于不同尺度的敏感性这有助于提升模型对不同大小目标的检测能力。2. 提高效率通过在网络的最后一部分只进行一次特征聚合OSA减少了重复的特征计算和存储需求从而提高了网络的计算和能源效率。在RCS-YOLO中OSA模块被进一步与RCS基于通道Shuffle的重参数化卷积相结合形成RCS-OSA模块。这种结合不仅保持了低成本的内存消耗而且还实现了语义信息的有效提取对于构建轻量级和大规模的对象检测器尤为重要。下面我将为大家展示RCS-OSAOne-Shot Aggregation of RCS的结构。在RCS-OSA模块中输入被分为两部分一部分直接通过另一部分通过堆叠的RCS模块进行处理。处理后的特征和直接通过的特征在通道混洗Channel Shuffle后合并。这种结构设计用于增强模型的特征提取和利用效率是RCS-YOLO架构中的一个关键组成部分旨在通过一次性聚合来提高模型处理特征的能力同时保持计算效率。2.5特征级联特征级联feature cascade是一种技术通过在网络的一次性聚合one-shot aggregate路径上维持有限数量的特征级联来实现的。在RCS-YOLO中特别是在RCS-OSARCS-Based One-Shot Aggregation模块中只保留了三个特征级联。特征级联的目的是为了减轻网络计算负担并降低内存占用。这种方法可以有效地聚合不同层次的特征提高模型的语义信息提取能力同时避免了过度复杂化网络结构所带来的低效率和高资源消耗。下面为大家提供的图像展示的是RCS-YOLO的整体架构其中包括RCS-OSA模块。RCS-OSA在模型中用于堆叠RCS模块以确保特征的复用并加强不同层之间的信息流动。图中显示的多层RCS-OSA模块的排列和组合反映了它们如何一起工作以优化特征传递和提高检测性能。总结RCS-YOLO主要由RCS-OSA蓝色模块和RepVGG橙色模块构成。这里的n代表堆叠RCS模块的数量。n_cls代表检测到的对象中的类别数量。图中的IDetect是从YOLOv7中借鉴过来的表示使用二维卷积神经网络的检测层。这个架构通过堆叠的RCS模块和RepVGG模块以及两种类型的检测层实现了对象检测的任务。三、RCS-OSA核心代码核心代码使用方式看章节四包含二次创新C3k2和C2PSA.import torch.nn as nn import torch import torch.nn.functional as F import numpy as np import math __all__ [C3k2_RCSOSA, RCSOSA, C2PSA_RCSOSA] # build RepVGG block # ----------------------------- def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups1): result nn.Sequential() result.add_module(conv, nn.Conv2d(in_channelsin_channels, out_channelsout_channels, kernel_sizekernel_size, stridestride, paddingpadding, groupsgroups, biasFalse)) result.add_module(bn, nn.BatchNorm2d(num_featuresout_channels)) return result class SEBlock(nn.Module): def __init__(self, input_channels): super(SEBlock, self).__init__() internal_neurons input_channels // 8 self.down nn.Conv2d(in_channelsinput_channels, out_channelsinternal_neurons, kernel_size1, stride1, biasTrue) self.up nn.Conv2d(in_channelsinternal_neurons, out_channelsinput_channels, kernel_size1, stride1, biasTrue) self.input_channels input_channels def forward(self, inputs): x F.avg_pool2d(inputs, kernel_sizeinputs.size(3)) x self.down(x) x F.relu(x) x self.up(x) x torch.sigmoid(x) x x.view(-1, self.input_channels, 1, 1) return inputs * x class RepVGG(nn.Module): def __init__(self, in_channels, out_channels, kernel_size3, stride1, padding1, dilation1, groups1, padding_modezeros, deployFalse, use_seFalse): super(RepVGG, self).__init__() self.deploy deploy self.groups groups self.in_channels in_channels padding_11 padding - kernel_size // 2 self.nonlinearity nn.SiLU() # self.nonlinearity nn.ReLU() if use_se: self.se SEBlock(out_channels) else: self.se nn.Identity() if deploy: self.rbr_reparam nn.Conv2d(in_channelsin_channels, out_channelsout_channels, kernel_sizekernel_size, stridestride, paddingpadding, dilationdilation, groupsgroups, biasTrue, padding_modepadding_mode) else: self.rbr_identity nn.BatchNorm2d( num_featuresin_channels) if out_channels in_channels and stride 1 else None self.rbr_dense conv_bn(in_channelsin_channels, out_channelsout_channels, kernel_sizekernel_size, stridestride, paddingpadding, groupsgroups) self.rbr_1x1 conv_bn(in_channelsin_channels, out_channelsout_channels, kernel_size1, stridestride, paddingpadding_11, groupsgroups) # print(RepVGG Block, identity , self.rbr_identity) def get_equivalent_kernel_bias(self): kernel3x3, bias3x3 self._fuse_bn_tensor(self.rbr_dense) kernel1x1, bias1x1 self._fuse_bn_tensor(self.rbr_1x1) kernelid, biasid self._fuse_bn_tensor(self.rbr_identity) return kernel3x3 self._pad_1x1_to_3x3_tensor(kernel1x1) kernelid, bias3x3 bias1x1 biasid def _pad_1x1_to_3x3_tensor(self, kernel1x1): if kernel1x1 is None: return 0 else: return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1]) def _fuse_bn_tensor(self, branch): if branch is None: return 0, 0 if isinstance(branch, nn.Sequential): kernel branch.conv.weight running_mean branch.bn.running_mean running_var branch.bn.running_var gamma branch.bn.weight beta branch.bn.bias eps branch.bn.eps else: assert isinstance(branch, nn.BatchNorm2d) if not hasattr(self, id_tensor): input_dim self.in_channels // self.groups kernel_value np.zeros((self.in_channels, input_dim, 3, 3), dtypenp.float32) for i in range(self.in_channels): kernel_value[i, i % input_dim, 1, 1] 1 self.id_tensor torch.from_numpy(kernel_value).to(branch.weight.device) kernel self.id_tensor running_mean branch.running_mean running_var branch.running_var gamma branch.weight beta branch.bias eps branch.eps std (running_var eps).sqrt() t (gamma / std).reshape(-1, 1, 1, 1) return kernel * t, beta - running_mean * gamma / std def forward(self, inputs): if hasattr(self, rbr_reparam): return self.nonlinearity(self.se(self.rbr_reparam(inputs))) if self.rbr_identity is None: id_out 0 else: id_out self.rbr_identity(inputs) return self.nonlinearity(self.se(self.rbr_dense(inputs) self.rbr_1x1(inputs) id_out)) def fusevggforward(self, x): return self.nonlinearity(self.rbr_dense(x)) # RepVGG block end # ----------------------------- class SR(nn.Module): # Shuffle RepVGG def __init__(self, c1, c2): super().__init__() c1_ int(c1 // 2) c2_ int(c2 // 2) self.repconv RepVGG(c1_, c2_) def forward(self, x): x1, x2 x.chunk(2, dim1) out torch.cat((x1, self.repconv(x2)), dim1) out self.channel_shuffle(out, 2) return out def channel_shuffle(self, x, groups): batchsize, num_channels, height, width x.data.size() channels_per_group num_channels // groups x x.view(batchsize, groups, channels_per_group, height, width) x torch.transpose(x, 1, 2).contiguous() x x.view(batchsize, -1, height, width) return x def make_divisible(x, divisor): # Returns nearest x divisible by divisor if isinstance(divisor, torch.Tensor): divisor int(divisor.max()) # to int return math.ceil(x / divisor) * divisor class RCSOSA(nn.Module): # VoVNet with Res Shuffle RepVGG def __init__(self, c1, c2, n1, seFalse, e0.5, head8): super().__init__() n_ n // 2 c_ make_divisible(int(c1 * e), head) # self.conv1 Conv(c1, c_) self.conv1 RepVGG(c1, c_) self.conv3 RepVGG(int(c_ * 3), c2) self.sr1 nn.Sequential(*[SR(c_, c_) for _ in range(n_)]) self.sr2 nn.Sequential(*[SR(c_, c_) for _ in range(n_)]) self.se None if se: self.se SEBlock(c2) def forward(self, x): x1 self.conv1(x) x2 self.sr1(x1) x3 self.sr2(x2) x torch.cat((x1, x2, x3), 1) return self.conv3(x) if self.se is None else self.se(self.conv3(x)) class Bottleneck(nn.Module): Standard bottleneck. def __init__( self, c1: int, c2: int, shortcut: bool True, g: int 1, k: tuple[int, int] (3, 3), e: float 0.5 ): Initialize a standard bottleneck module. Args: c1 (int): Input channels. c2 (int): Output channels. shortcut (bool): Whether to use shortcut connection. g (int): Groups for convolutions. k (tuple): Kernel sizes for convolutions. e (float): Expansion ratio. super().__init__() c_ int(c2 * e) # hidden channels self.cv1 Conv(c1, c_, k[0], 1) self.cv2 Conv(c_, c2, k[1], 1, gg) self.add shortcut and c1 c2 def forward(self, x: torch.Tensor) - torch.Tensor: Apply bottleneck with optional shortcut connection. return x self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x)) def autopad(k, pNone, d1): # kernel, padding, dilation Pad to same shape outputs. if d 1: k d * (k - 1) 1 if isinstance(k, int) else [d * (x - 1) 1 for x in k] # actual kernel-size if p is None: p k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad return p class Conv(nn.Module): Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation). default_act nn.SiLU() # default activation def __init__(self, c1, c2, k1, s1, pNone, g1, d1, actTrue): Initialize Conv layer with given arguments including activation. super().__init__() self.conv nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groupsg, dilationd, biasFalse) self.bn nn.BatchNorm2d(c2) self.act self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity() def forward(self, x): Apply convolution, batch normalization and activation to input tensor. return self.act(self.bn(self.conv(x))) def forward_fuse(self, x): Perform transposed convolution of 2D data. return self.act(self.conv(x)) class C2f(nn.Module): Faster Implementation of CSP Bottleneck with 2 convolutions. def __init__(self, c1, c2, n1, shortcutFalse, g1, e0.5): Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing. super().__init__() self.c int(c2 * e) # hidden channels self.cv1 Conv(c1, 2 * self.c, 1, 1) self.cv2 Conv((2 n) * self.c, c2, 1) # optional actFReLU(c2) self.m nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k((3, 3), (3, 3)), e1.0) for _ in range(n)) def forward(self, x): Forward pass through C2f layer. y list(self.cv1(x).chunk(2, 1)) y.extend(m(y[-1]) for m in self.m) return self.cv2(torch.cat(y, 1)) def forward_split(self, x): Forward pass using split() instead of chunk(). y list(self.cv1(x).split((self.c, self.c), 1)) y.extend(m(y[-1]) for m in self.m) return self.cv2(torch.cat(y, 1)) class C3(nn.Module): CSP Bottleneck with 3 convolutions. def __init__(self, c1, c2, n1, shortcutTrue, g1, e0.5): Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values. super().__init__() c_ int(c2 * e) # hidden channels self.cv1 Conv(c1, c_, 1, 1) self.cv2 Conv(c1, c_, 1, 1) self.cv3 Conv(2 * c_, c2, 1) # optional actFReLU(c2) self.m nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k((1, 1), (3, 3)), e1.0) for _ in range(n))) def forward(self, x): Forward pass through the CSP bottleneck with 2 convolutions. return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1)) class PSABlock_RCSOSA(nn.Module): PSABlock class implementing a Position-Sensitive Attention block for neural networks. This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers with optional shortcut connections. Attributes: attn (Attention): Multi-head attention module. ffn (nn.Sequential): Feed-forward neural network module. add (bool): Flag indicating whether to add shortcut connections. Methods: forward: Performs a forward pass through the PSABlock, applying attention and feed-forward layers. def __init__(self, c: int, attn_ratio: float 0.5, num_heads: int 4, shortcut: bool True) - None: Initialize the PSABlock. Args: c (int): Input and output channels. attn_ratio (float): Attention ratio for key dimension. num_heads (int): Number of attention heads. shortcut (bool): Whether to use shortcut connections. super().__init__() self.attn RCSOSA(c, c, seFalse, eattn_ratio, headnum_heads) self.ffn nn.Sequential(Conv(c, c * 2, 1), Conv(c * 2, c, 1, actFalse)) self.add shortcut def forward(self, x: torch.Tensor) - torch.Tensor: Execute a forward pass through PSABlock. Args: x (torch.Tensor): Input tensor. Returns: (torch.Tensor): Output tensor after attention and feed-forward processing. x x self.attn(x) if self.add else self.attn(x) x x self.ffn(x) if self.add else self.ffn(x) return x class C3k(C3): C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks. def __init__(self, c1: int, c2: int, n: int 1, shortcut: bool True, g: int 1, e: float 0.5, k: int 3): Initialize C3k module. Args: c1 (int): Input channels. c2 (int): Output channels. n (int): Number of Bottleneck blocks. shortcut (bool): Whether to use shortcut connections. g (int): Groups for convolutions. e (float): Expansion ratio. k (int): Kernel size. super().__init__(c1, c2, n, shortcut, g, e) c_ int(c2 * e) # hidden channels # self.m nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k(k, k), e1.0) for _ in range(n))) self.m nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k(k, k), e1.0) for _ in range(n))) class C3k2_RCSOSA(C2f): Faster Implementation of CSP Bottleneck with 2 convolutions. def __init__( self, c1: int, c2: int, n: int 1, c3k: bool False, e: float 0.5, attn: bool False, g: int 1, shortcut: bool True, ): Initialize C3k2 modu Args: c1 (int): Input channels. c2 (int): Output channels. n (int): Number of blocks. c3k (bool): Whether to use C3k blocks. e (float): Expansion ratio. attn (bool): Whether to use attention blocks. g (int): Groups for convolutions. shortcut (bool): Whether to use shortcut connections. super().__init__(c1, c2, n, shortcut, g, e) self.m nn.ModuleList( nn.Sequential( Bottleneck(self.c, self.c, shortcut, g), PSABlock_RCSOSA(self.c, attn_ratio0.5, num_headsmax(self.c // 64, 1)), ) if attn else C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n) ) class C2PSA_RCSOSA(nn.Module): C2PSA_RCSOSA module with attention mechanism for enhanced feature extraction and processing. This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations. Attributes: c (int): Number of hidden channels. cv1 (Conv): 1x1 convolution layer to reduce the number of input channels to 2*c. cv2 (Conv): 1x1 convolution layer to reduce the number of output channels to c1. m (nn.Sequential): Sequential container of PSABlock modules for attention and feed-forward operations. Methods: forward: Performs a forward pass through the C2PSA module, applying attention and feed-forward operations. Notes: This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules. def __init__(self, c1: int, c2: int, n: int 1, e: float 0.5): Initialize C2PSA module. Args: c1 (int): Input channels. c2 (int): Output channels. n (int): Number of PSABlock modules. e (float): Expansion ratio. super().__init__() assert c1 c2 self.c int(c1 * e) self.cv1 Conv(c1, 2 * self.c, 1, 1) self.cv2 Conv(2 * self.c, c1, 1) self.m nn.Sequential(*(PSABlock_RCSOSA(self.c, attn_ratio0.5, num_headsself.c // 64) for _ in range(n))) def forward(self, x: torch.Tensor) - torch.Tensor: Process the input tensor through a series of PSA blocks. Args: x (torch.Tensor): Input tensor. Returns: (torch.Tensor): Output tensor after processing. a, b self.cv1(x).split((self.c, self.c), dim1) b self.m(b) return self.cv2(torch.cat((a, b), 1)) if __name__ __main__: # Generating Sample image image_size (1, 64, 240, 240) image torch.rand(*image_size) # Model mobilenet_v1 C3k2_RCSOSA(64, 64,attnTrue) out mobilenet_v1(image) print(out.size())四、手把手教你添加RCS-OSA模块下面的步骤如果你不会或者不想麻烦操作可以联系作者获得本专栏添加所有项目文件的源代码可直接训练.4.1 修改一第一还是建立文件我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是Addmodules文件夹4.2 修改二然后在Addmodules文件夹内建立一个新的py文件将本文章节三中的“核心代码复制粘贴进去。4.3 修改三第二步我们在该目录下创建一个新的py文件名字为__init__.py然后在其内部导入我们的文件如下图所示。4.4 修改四第三步我门中到如下文件ultralytics/nn/tasks.py进行导入和注册我们的模块(此处只需要添加一次即可如果你用我其它的改进机制这里的步骤只需要添加一次)4.5 修改五在ultralytics/nn/tasks.py文件内的parse_model方法函数内位置大概在1500行左右按照图示位置添加即可此处需要自己有一定的判别能力如果不会可联系作者获得视频教程。4.6 修改六在ultralytics/nn/tasks.py文件内的parse_model方法函数内位置大概在1600行左右按照图示位置进行代码的替换即可此处不改如果你yaml文件中的所有C3k2都被改名了则检测头会使用老版本的v8检测头参数量会大幅度增加但不影响运行很多人都忽略了这一步。if C3k2 in getattr(m, __name__, str(m)): legacy False if scale in mlx: args[3] True到此就修改完成了大家可以复制下面的yaml文件运行更多使用方式可以联系作者获得使用视频本文仅列出常见的使用方式。。五、正式训练5.1 yaml文件5.1.1 yaml文件1训练信息YOLO26-C2PSA-RCSOSA summary: 266 layers, 2,849,400 parameters, 2,849,400 gradients, 6.3 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA_RCSOSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.1.2 yaml文件2训练信息YOLO26-C3k2-RCSOSA summary: 261 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs# Ultralytics AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. modelyolo26n.yaml will call yolo26.yaml with scale n # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2_RCSOSA, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2_RCSOSA, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2_RCSOSA, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2_RCSOSA, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2_RCSOSA, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, nearest]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2_RCSOSA, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2_RCSOSA, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2_RCSOSA, [1024, True, 0.5, True]] # 22 (P5/32-large) # 所有的C3k2虽然都被替换但因为本机制是二次创新PSABlock模块仅有此处的代码运行生效了其余的还是C3k2 - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)5.2 训练代码大家可以创建一个py文件将我给的代码复制粘贴进去配置好自己的文件路径即可运行。import warnings warnings.filterwarnings(ignore) from ultralytics import YOLO if __name__ __main__: model YOLO(模型配置文件地址,也就是5.1你保存到本地文件的地址) # 如何切换模型版本, 上面的ymal文件可以改为 yolo26s.yaml就是使用的26s, # 类似某个改进的yaml文件名称为yolo26-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolo26l-XXX.yaml即可改的是上面YOLO中间的名字不是配置文件的 # model.load(yolo26n.pt) # 是否加载预训练权重,科研不建议大家加载否则很难提升精度 model.train( datar数据集文件地址, # 如果大家任务是其它的ultralytics/cfg/default.yaml找到这里修改task可以改成detect, segment, classify, pose cacheFalse, imgsz640, epochs20, single_clsFalse, # 是否是单类别检测 batch16, close_mosaic0, workers0, device0, optimizerMuSGD, # using SGD/MuSGD # resume, # 这里是填写last.pt地址 ampTrue, # 如果出现训练损失为Nan可以关闭amp projectruns/train, nameexp, )5.3 训练过程截图五、本文总结到此本文的正式分享内容就结束了在这里给大家推荐我的YOLOv26改进有效涨点专栏本专栏目前为新开的平均质量分98分后期我会根据各种最新的前沿顶会进行论文复现也会对一些老的改进机制进行补充如果大家觉得本文帮助到你了订阅本专栏关注后续更多的更新~专栏链接YOLOv26有效涨点专栏包含Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制