Llama3CausalLM classkeras_nlp.models.Llama3CausalLM(backbone, preprocessor=None, **kwargs)
An end-to-end Llama 3 model for causal language modeling.
A causal language model (LM) predicts the next token based on previous
tokens. This task setup can be used to train the model unsupervised on
plain text input, or to autoregressively generate plain text similar to
the data used for training. This task can be used for pre-training or
fine-tuning a LLaMA 3 model, simply by calling fit().
This model has a generate() method, which generates text based on a
prompt. The generation strategy used is controlled by an additional
sampler argument on compile(). You can recompile the model with
different keras_nlp.samplers objects to control the generation. By
default, "top_k" sampling will be used.
Arguments
keras_nlp.models.Llama3Backbone instance.keras_nlp.models.Llama3CausalLMPreprocessor or None.
If None, this model will not apply preprocessing, and inputs
should be preprocessed before calling the model.from_preset methodLlama3CausalLM.from_preset(preset, load_weights=True, **kwargs)
Instantiate a keras_nlp.models.Task from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset can be passed as a
one of:
'bert_base_en''kaggle://user/bert/keras/bert_base_en''hf://user/bert_base_en''./bert_base_en'For any Task subclass, you can run cls.presets.keys() to list all
built-in presets available on the class.
This constructor can be called in one of two ways. Either from a task
specific base class like keras_nlp.models.CausalLM.from_preset(), or
from a model class like keras_nlp.models.BertClassifier.from_preset().
If calling from the a base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True, the weights will be loaded into the
model architecture. If False, the weights will be randomly
initialized.Examples
# Load a Gemma generative task.
causal_lm = keras_nlp.models.CausalLM.from_preset(
"gemma_2b_en",
)
# Load a Bert classification task.
model = keras_nlp.models.Classifier.from_preset(
"bert_base_en",
num_classes=2,
)
| Preset name | Parameters | Description |
|---|---|---|
| llama3_8b_en | 8.03B | LLaMA 3 8B Base model |
| llama3_instruct_8b_en | 8.03B | LLaMA 3 8B Instruct model |
generate methodLlama3CausalLM.generate(inputs, max_length=None, stop_token_ids="auto")
Generate text given prompt inputs.
This method generates text based on given inputs. The sampling method
used for generation can be set via the compile() method.
If inputs are a tf.data.Dataset, outputs will be generated
"batch-by-batch" and concatenated. Otherwise, all inputs will be handled
as a single batch.
If a preprocessor is attached to the model, inputs will be
preprocessed inside the generate() function and should match the
structure expected by the preprocessor layer (usually raw strings).
If a preprocessor is not attached, inputs should match the structure
expected by the backbone. See the example usage above for a
demonstration of each.
Arguments
tf.data.Dataset. If a
preprocessor is attached to the model, inputs should match
the structure expected by the preprocessor layer. If a
preprocessor is not attached, inputs should match the
structure expected the backbone model.sequence_length of the
preprocessor. If preprocessor is None, inputs should be
should be padded to the desired maximum length and this argument
will be ignored.None, "auto", or tuple of token ids. Defaults
to "auto" which uses the preprocessor.tokenizer.end_token_id.
Not specifying a processor will produce an error. None stops
generation after generating max_length tokens. You may also
specify a list of token id's the model should stop on. Note that
sequences of tokens will each be interpreted as a stop token,
multi-token stop sequences are not supported.backbone propertykeras_nlp.models.Llama3CausalLM.backbone
A keras_nlp.models.Backbone model with the core architecture.
preprocessor propertykeras_nlp.models.Llama3CausalLM.preprocessor
A keras_nlp.models.Preprocessor layer used to preprocess input.