适配器配置

代表适配器模块和融合层架构的类。

单一（瓶颈）适配器

class adapters.AdapterConfig

所有适配方法的基础类。该类不定义具体的配置键，仅提供一些通用的辅助方法。

Parameters: 架构 (str, 可选) – 由配置定义的适配方法类型。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

class adapters.BnConfig(mh_adapter: bool, output_adapter: bool, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping], non_linearity: str, original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

定义瓶颈适配器架构的基础类。

Parameters

mh_adapter (bool) - 如果为True，则在每层的多头注意力块后添加适配器模块。
output_adapter (bool) – 如果为True，则在每层的输出FFN后添加适配器模块。
reduction_factor (float 或 Mapping) - 可以是一个标量浮点数(>0)用于指定所有层的缩减因子，也可以是一个从层ID(从0开始)到值的映射，用于指定各层的缩减因子。如果映射中没有包含所有层，则应提供一个默认值，例如{'1': 8, '6': 32, 'default': 16}。指定小于1的缩减因子将导致上投影层的产生。
non_linearity (str) – 适配器瓶颈中使用的激活函数。
original_ln_before (bool, optional) – 如果为True，则在适配器模块之前应用预训练的层归一化和残差连接。默认为False。仅当is_parallel为False时适用。
original_ln_after (bool, optional) – 如果为True，则在适配器模块后应用预训练的层归一化和残差连接。默认为True。
ln_before (bool, optional) – 如果为True，在适配器瓶颈层前添加新的层归一化。默认为False。
ln_after (bool, optional) – 如果为True，则在适配器瓶颈层后添加新的层归一化。默认为False。
init_weights (str, optional) – 适配器模块权重的初始化方法。当前可选值为"bert"(默认)、"mam_adapter"或"houlsby"。
init_weights_seed (int, optional) – 用于每层适配器权重初始化的种子值。重要提示：如果设置了该值，所有适配器模块的种子将被重置，这意味着所有适配器模块将具有相同的初始化状态。如果不设置，种子将只设置一次，每个适配器模块会有随机的权重初始化。默认为None。
is_parallel (bool, optional) – 如果为True，则并行应用适配器转换。默认情况下(False)使用顺序应用。
scaling (float 或 str, 可选) – 用于缩放适配器输出的缩放因子，如He等人(2021)所采用的缩放加法操作。可以是常数因子(float)，或字符串"learned"(此时缩放因子是可学习的)，或字符串"channel"(此时我们会初始化一个通道形状的可学习缩放向量)。默认为1.0。
use_gating (bool, optional) – 在添加的参数模块旁放置一个可训练的门控模块来控制模块激活。例如用于UniPELT。默认为False。
residual_before_ln (bool 或 str, 可选) - 如果为True，则在层归一化之前围绕适配器瓶颈进行残差连接。如果设置为"post_add"，则在上一个残差连接之后围绕适配器瓶颈进行残差连接。仅当original_ln_before为True时适用。
adapter_residual_before_ln (bool, 可选) – 如果为True，则在适配器内部新的层归一化之前，围绕适配器模块应用残差连接。仅当ln_after为True且is_parallel为False时适用。
inv_adapter (str, 可选) – 如果不为None(默认值)，则在模型嵌入层后添加可逆适配器模块。当前可选值为"nice"或"glow"。
inv_adapter_reduction_factor (float, 可选) – 在可逆适配器模块中使用的缩减因子。仅当inv_adapter不为None时适用。
cross_adapter (bool, 可选参数) – 如果设为True，会在编码器-解码器模型中每个解码器层的交叉注意力块后添加适配器模块。默认为False。
leave_out (List[int], 可选参数) – 指定不需要添加适配器模块的层ID列表(从0开始计数)。
dropout (float, 可选) – 适配器层中使用的dropout率。默认为0.0。
phm_layer (bool, 可选) – 如果为True，则降维和升维层将使用PHMLayer。默认为False
phm_dim (int, 可选) – PHM矩阵的维度。仅当phm_layer设置为True时适用。默认为4。
shared_phm_rule (bool, 可选) – 是否在所有层之间共享phm矩阵。默认为True
factorized_phm_rule (bool, 可选参数) – 是否将phm矩阵分解为左右两个矩阵。默认为False。
learn_phm (bool, 可选) – 是否在训练期间学习phm矩阵。默认为True
( (factorized_phm_W) – obj:bool, optional): 权重矩阵是否被分解为左右两个矩阵。默认为True
shared_W_phm (bool, optional) – 权重矩阵是否在所有层之间共享。默认为False。
phm_c_init (str, optional) – phm矩阵权重的初始化函数。可选值为 [“normal”, “uniform”]。默认为 normal。
phm_init_range (float, 可选) – 当phm_c_init="normal"时，用于初始化phm权重的标准差。默认为0.0001。
hypercomplex_nonlinearity (str, 可选) – 指定从phm层中抽取权重的分布方式。默认为glorot-uniform。
phm_rank (int, optional) – 如果权重矩阵被分解，此参数指定矩阵的秩。例如，下投影的左侧矩阵形状为(phm_dim, _in_feats_per_axis, phm_rank)，右侧矩阵形状为(phm_dim, phm_rank, _out_feats_per_axis)。默认值为1
phm_bias (bool, 可选) – 如果为True，则上下投影PHMLayer会包含偏置项。如果phm_layer为False则忽略此项。默认为True
stochastic_depth (float, optional) – 该值指定模型在训练期间丢弃整个层的概率。此参数应仅用于涉及残差网络的基于视觉的任务。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

class adapters.SeqBnConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0): Pfeiffer等人(2020)提出的适配器架构。详见https://arxiv.org/pdf/2005.00247.pdf。

class adapters.SeqBnInvConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = 'nice', inv_adapter_reduction_factor: ~typing.Optional[float] = 2, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0): Pfeiffer等人(2020)提出的适配器架构。详见https://arxiv.org/pdf/2005.00247.pdf。

class adapters.DoubleSeqBnConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0): Houlsby等人(2019)提出的适配器架构。详见https://arxiv.org/pdf/1902.00751.pdf。

class adapters.DoubleSeqBnInvConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = 'nice', inv_adapter_reduction_factor: ~typing.Optional[float] = 2, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0): Houlsby等人提出的适配器架构(2019)。参见https://arxiv.org/pdf/1902.00751.pdf。

class adapters.ParBnConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 2, non_linearity: str = 'relu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'mam_adapter', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = True, scaling: ~typing.Union[float, str] = 4.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0): He等人(2021)提出的并行适配器架构。详见https://arxiv.org/pdf/2110.04366.pdf。

class adapters.CompacterConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0): 由Mahabadi等人(2021)提出的Compacter架构。详见https://arxiv.org/pdf/2106.04647.pdf。

class adapters.CompacterPlusPlusConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0): Mahabadi等人(2021)提出的Compacter++架构。详见https://arxiv.org/pdf/2106.04647.pdf。

class adapters.AdapterPlusConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 96, non_linearity: str = 'gelu', original_ln_before: bool = True, original_ln_after: bool = False, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'houlsby', init_weights_seed: ~typing.Optional[int] = None, is_parallel: bool = False, scaling: ~typing.Union[float, str] = 'channel', use_gating: bool = False, residual_before_ln: bool = False, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: float = 0.1)

由Jan-Martin O、Steitz和Stefan Roth提出的AdapterPlus配置架构。详见https://arxiv.org/pdf/2406.06820

请注意，适配器参数original_ln_after、original_ln_before和residual_before_ln的某些配置可能在训练时导致性能问题。

In the general case:

为了确保保留预训练中的原始残差连接，至少应将original_ln_before或original_ln_after其中一个设置为True。
如果original_ln_after设置为False，则必须同时将residual_before_ln也设置为False，以确保训练期间的收敛性。

前缀调优

class adapters.PrefixTuningConfig(architecture: ~typing.Optional[str] = 'prefix_tuning', encoder_prefix: bool = True, cross_prefix: bool = True, leave_out: ~typing.List[int] = <factory>, flat: bool = False, prefix_length: int = 30, bottleneck_size: int = 512, non_linearity: str = 'tanh', dropout: float = 0.0, use_gating: bool = False, shared_gating: bool = True, init_weights_seed: ~typing.Optional[int] = None)

由Li & Liang (2021)提出的Prefix Tuning架构。详见https://arxiv.org/pdf/2101.00190.pdf。

Parameters

encoder_prefix (bool) – 如果为True，则为编码器-解码器模型的编码器添加前缀。
cross_prefix (bool) – 如果为True，则在编码器-解码器模型的交叉注意力机制中添加前缀。
flat (bool) - 如果为True，直接训练前缀参数。否则，使用瓶颈MLP进行重新参数化。
prefix_length (int) – 前缀的长度。
bottleneck_size (int) – 如果flat=False，表示瓶颈MLP的大小。
non_linearity (str) – 如果flat=False，表示在瓶颈MLP中使用的非线性函数。
dropout (float) – 前缀调优层中使用的dropout率。
leave_out (List[int]) – 不需要添加前缀的层ID列表（从0开始计数）。
use_gating (bool, optional) – 在添加的参数模块旁放置一个可训练的门控模块来控制模块激活。例如用于UniPELT。默认为False。
( (shared_gating) – obj:bool, optional): 是否为所有注意力矩阵的前缀使用共享门。仅当use_gating=True时适用。默认为True。
init_weights_seed (int, optional) – 用于每层适配器权重初始化的种子值。重要提示：如果设置了该值，所有适配器模块的种子将被重置，这意味着所有适配器模块将具有相同的初始化状态。如果不设置，种子将只设置一次，每个适配器模块会有随机的权重初始化。默认为None。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

LoRA配置

class adapters.LoRAConfig(architecture: ~typing.Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = False, output_lora: bool = False, leave_out: ~typing.List[int] = <factory>, r: int = 8, alpha: int = 8, dropout: float = 0.0, attn_matrices: ~typing.List[str] = <factory>, composition_mode: str = 'add', init_weights: str = 'lora', init_weights_seed: ~typing.Optional[int] = None, use_gating: bool = False, vera_d: ~typing.Optional[float] = None, vera_b: ~typing.Optional[float] = None, dtype: ~typing.Optional[str] = None)

Hu等人(2021)提出的低秩自适应(LoRA)架构。详见https://arxiv.org/pdf/2106.09685.pdf。 LoRA通过重新参数化层矩阵的权重来适配模型。您可以使用model.merge_adapter("lora_name")将附加权重与原始层权重合并。

Parameters

selfattn_lora (bool, optional) – 如果为True，则向模型的自注意力权重添加LoRA。默认为True。
intermediate_lora (bool, optional) – 如果为True，则向模型的中间MLP权重添加LoRA。默认为False。
output_lora (bool, optional) – 如果为True，则向模型的输出MLP权重添加LoRA。默认为False。
leave_out (List[int], 可选参数) – 指定不需要添加适配器模块的层ID列表(从0开始计数)。
r (int, optional) – LoRA层的秩。默认为8。
alpha (int, optional) – 用于缩放LoRA重新参数化的超参数。默认为8。
dropout (float, optional) – LoRA层中使用的dropout率。默认为0.0。
attn_matrices (List[str], optional) - 决定要适配自注意力模块中的哪些矩阵。可以包含字符串"q"(查询)、"k"(键)、"v"(值)的列表。默认为["q", "v"]。
composition_mode (str, optional) – 定义注入权重如何与原始模型权重组合。可以是"add"(分解矩阵相加，如LoRA)或"scale"(向量元素相乘，如(IA)^3)。"scale"只能与r=1一起使用。默认为"add"。
init_weights (str, optional) – LoRA模块权重的初始化方法。目前可选值为"lora"(默认)、"bert"或"vera"。
init_weights_seed (int, optional) – 用于每层适配器权重初始化的种子值。重要提示：如果设置了该值，所有适配器模块的种子将被重置，这意味着所有适配器模块将具有相同的初始化状态。如果不设置，种子将只设置一次，每个适配器模块会有随机的权重初始化。默认为None。
use_gating (bool, 可选参数) – 在添加的参数模块旁放置一个可训练的门控模块来控制模块激活。例如在UniPELT中使用。默认为False。注意：使用use_gating=True的模块无法通过merge_adapter()进行合并。
vera_d (float, optional) – VeraConfig中使用的d值。默认为None。在分解矩阵A前放置一个可训练的缩放参数d，用于调整内部权重的缩放比例。
vera_b (float, optional) – VeraConfig中使用的b值。默认为None。在分解矩阵B前放置一个可训练的缩放参数b，用于调整内部权重的缩放比例。
dtype (str, optional) – 用于重参数化张量的torch数据类型。默认为None。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

IA3配置

class adapters.IA3Config(architecture: ~typing.Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = True, output_lora: bool = False, leave_out: ~typing.List[int] = <factory>, r: int = 1, alpha: int = 1, dropout: float = 0.0, attn_matrices: ~typing.List[str] = <factory>, composition_mode: str = 'scale', init_weights: str = 'ia3', init_weights_seed: ~typing.Optional[int] = None, use_gating: bool = False, vera_d: ~typing.Optional[float] = None, vera_b: ~typing.Optional[float] = None, dtype: ~typing.Optional[str] = None)

Liu等人(2022)提出的"通过抑制和放大内部激活的融合适配器"((IA)^3)架构。详见https://arxiv.org/pdf/2205.05638.pdf。(IA)^3建立在LoRA之上，但与LoRA的加法组合不同，它通过注入的向量来缩放层的权重。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

PromptTuning配置

class adapters.PromptTuningConfig(architecture: str = 'prompt_tuning', prompt_length: int = 10, prompt_init: str = 'random_uniform', prompt_init_text: Optional[str] = None, combine: str = 'prefix', init_weights_seed: Optional[int] = None)

Lester等人(2021)提出的Prompt Tuning架构。详见https://arxiv.org/pdf/2104.08691.pdf

Parameters

prompt_length (int) – 提示词(token)的数量。默认为10。
prompt_init (str) – 提示的初始化方法。可以是"random_uniform"或"from_string"。默认为"random_uniform"。
prompt_init_text (str) – 当prompt_init="from_string"时，用于提示初始化的文本。
random_uniform_scale (float) – 当prompt_init="random_uniform"时随机均匀初始化的比例。默认值为论文中使用的0.5。
combine (str) – 用于将提示与输入结合的方法。可以是"prefix"或"prefix_after_bos"。默认为"prefix"。
init_weights_seed (int, optional) – 用于每层适配器权重初始化的种子值。重要提示：如果设置了该值，所有适配器模块的种子将被重置，这意味着所有适配器模块将具有相同的初始化状态。如果不设置，种子将只设置一次，每个适配器模块会有随机的权重初始化。默认为None。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

ReFT

class adapters.ReftConfig(layers: Union[Literal['all'], List[int]], prefix_positions: int, suffix_positions: int, r: int, orthogonality: bool, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True, init_weights_seed: Optional[int] = None)

Wu等人(2024)提出的表征微调(ReFT)方法基类。详见https://arxiv.org/pdf/2404.03592。 ReFT方法的共同特点是在选定模型层和选定序列位置后添加"干预"，以调整模块输出产生的表征。

Parameters

layers (Union[Literal["all"], List[int]]) - 指定需要添加干预的层ID列表。如果设为"all"，则会在所有层后添加干预(默认值)。
prefix_positions (int) - 需要添加干预的前缀位置数量。
suffix_positions (int) - 要添加干预的后缀位置数量。
r (int) – 干预层的秩。
正交性 (bool) - 如果为True，则对投影矩阵强制执行正交性约束。
tied_weights (bool) – 如果为True，则在每层的prefix和suffix位置之间共享干预参数。
subtract_projection (bool) - 如果为True，则减去输入的投影。
dropout (float) - 干预层中使用的dropout率。
non_linearity (str) – 干预层中使用的激活函数。
dtype (str, optional) – 干预张量的torch数据类型。默认为None。
init_weights_seed (int, optional) – 用于每层适配器权重初始化的种子值。重要提示：如果设置了该值，所有适配器模块的种子将被重置，这意味着所有适配器模块将具有相同的初始化状态。如果不设置，种子将只设置一次，每个适配器模块会有随机的权重初始化。默认为None。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

class adapters.LoReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = True, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True, init_weights_seed: Optional[int] = None): Wu等人(2024)提出的低秩线性子空间ReFT方法。详见https://arxiv.org/pdf/2404.03592。

class adapters.NoReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = False, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True, init_weights_seed: Optional[int] = None): LoReft的无正交约束变体。

class adapters.DiReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = False, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True, init_weights_seed: Optional[int] = None): Wu等人(2024)提出的无正交约束和投影减法的LoReft变体。详见https://arxiv.org/pdf/2404.03592。

组合配置

class adapters.ConfigUnion(*configs: List[AdapterConfig])

将多种适配方法配置组合成一个。该类可用于定义复杂的适配方法设置。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

static validate(configs)

对配置列表执行简单验证，检查它们是否可以组合成一个通用设置。

Parameters

configs (List[AdapterConfig]) – 要检查的配置列表。

Raises

TypeError – 其中一个配置的类型错误。ValueError: 至少需要提供两个配置
冲突. –

class adapters.MAMConfig(prefix_tuning: Optional[PrefixTuningConfig] = None, adapter: Optional[BnConfig] = None): He等人(2021)提出的Mix-And-Match适配器架构。详见https://arxiv.org/pdf/2110.04366.pdf。

class adapters.UniPELTConfig(prefix_tuning: Optional[PrefixTuningConfig] = None, adapter: Optional[BnConfig] = None, lora: Optional[LoRAConfig] = None): Mao等人(2022)提出的UniPELT适配器架构。详见https://arxiv.org/pdf/2110.07577.pdf。

Adapter Fusion

class adapters.AdapterFusionConfig(key: bool, query: bool, value: bool, query_before_ln: bool, regularization: bool, residual_before: bool, temperature: bool, value_before_softmax: bool, value_initialized: str, dropout_prob: float)

作为适配器融合层架构模型的基础类。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], **kwargs)

将给定的适配器融合配置标识符加载为一个完整的AdapterFusionConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTERFUSION_CONFIG_MAP中可用的标识字符串
包含完整adapter融合配置的文件路径

Returns

解析后的适配器融合配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

class adapters.StaticAdapterFusionConfig(key: bool = True, query: bool = True, value: bool = False, query_before_ln: bool = False, regularization: bool = False, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = False, dropout_prob: Optional[float] = None): 静态版本的adapter融合，不包含值矩阵。详见https://arxiv.org/pdf/2005.00247.pdf。

class adapters.DynamicAdapterFusionConfig(key: bool = True, query: bool = True, value: bool = True, query_before_ln: bool = False, regularization: bool = True, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = True, dropout_prob: Optional[float] = None): 带有值矩阵和正则化的动态版本适配器融合。参见https://arxiv.org/pdf/2005.00247.pdf。

适配器设置

class adapters.AdapterSetup(adapter_setup, head_setup=None, ignore_empty: bool = False)

表示一个模型的适配器设置，包括活动适配器和活动头部。这个类旨在通过with语句作为上下文管理器使用。由AdapterSetup上下文定义的设置将覆盖模型中定义的静态适配器设置（即通过active_adapters指定的设置）。

示例：

with AdapterSetup(Stack("a", "b")):
    # will use the adapter stack "a" and "b" outputs = model(**inputs)

请注意，上下文管理器是线程局部的，也就是说它可以在多线程环境中使用不同的设置。

多任务配置

class adapters.MultiTaskConfig

适用于所有多任务适配方法的Flag类。该类不定义具体的配置键，仅提供一些通用的辅助方法。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。

class adapters.MTLLoRAConfig(architecture: ~typing.Optional[str] = 'mtl_lora', selfattn_lora: bool = True, intermediate_lora: bool = False, output_lora: bool = False, leave_out: ~typing.List[int] = <factory>, r: int = 8, alpha: int = 8, dropout: float = 0.0, attn_matrices: ~typing.List[str] = <factory>, composition_mode: str = 'add', init_weights: str = 'lora', init_weights_seed: ~typing.Optional[int] = None, use_gating: bool = False, vera_d: ~typing.Optional[float] = None, vera_b: ~typing.Optional[float] = None, dtype: ~typing.Optional[str] = None, n_up_projection: int = 1, task_specific_matrix_type: ~typing.Literal['singular_values', 'linear'] = 'singular_values', weights_sharpness: float = 0.05)

杨等人(2024)提出的MTL-LoRA架构将LoRA与多任务学习相结合。详见https://arxiv.org/pdf/2410.09437.pdf。该配置扩展了LoRA以支持多任务适配，允许在利用低秩重参数化技术的同时，实现跨多个任务的参数高效微调。

Parameters

n_up_projection (int, optional) – 用于任务特定适配的额外投影层数量。默认为1。
task_specific_matrix_type (Literal["singular_values", "linear"], optional) – 适配过程中使用的任务特定矩阵类型。可选值为"singular_values"（基于奇异值分解的变换进行适配）或"linear"（应用学习到的线性变换）。默认为"singular_values"。
weights_sharpness (float, optional) – 一个控制任务特定权重转换锐度的缩放因子，影响任务适应的应用程度。默认为0.05。

classmethod from_dict(config): 从Python字典创建配置类。

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

将给定的适配器配置标识符加载为一个完整的AdapterConfig实例。

Parameters

config (Union[dict, str]) –

要加载的配置。可以是以下任意一种：

表示完整配置的字典
ADAPTER_CONFIG_MAP中可用的标识符字符串
包含完整适配器配置的文件路径
Adapter-Hub中可用的标识符字符串

Returns

解析后的适配器配置字典。

Return type

字典

replace(**changes): 返回应用了指定更改的配置类新实例。

to_dict(): 将配置类转换为Python字典。