Upload GraniteForCausalLM
Browse files- README.md +17 -12
- config.json +33 -0
- generation_config.json +7 -0
- model-00001-of-00007.safetensors +3 -0
- model-00002-of-00007.safetensors +3 -0
- model-00003-of-00007.safetensors +3 -0
- model-00004-of-00007.safetensors +3 -0
- model-00005-of-00007.safetensors +3 -0
- model-00006-of-00007.safetensors +3 -0
- model-00007-of-00007.safetensors +3 -0
- model.safetensors.index.json +369 -0
README.md
CHANGED
@@ -17,7 +17,7 @@ keep an eye out for feedback and questions in the [Community section](https://hu
|
|
17 |
## Model Summary
|
18 |
|
19 |
**Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is a LoRA adapter for [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct),
|
20 |
-
providing access to the Uncertainty, Hallucination Detection, and Safety intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model.
|
21 |
|
22 |
It follows the same training pipeline as [ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1](https://huggingface.co/ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1), updated for Granite 3.1.
|
23 |
|
@@ -38,9 +38,9 @@ This percentage is *calibrated* in the following sense: given a set of answers a
|
|
38 |
### Hallucination Detection (RAG) Intrinsic
|
39 |
The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
|
40 |
|
41 |
-
### Safety Intrinsic
|
42 |
-
The Safety Intrinsic is designed to raise an exception when the user query is unsafe. This
|
43 |
-
The Safety intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
|
44 |
|
45 |
|
46 |
## Usage
|
@@ -74,17 +74,17 @@ You can further augment this system prompts for a given use case or task, but it
|
|
74 |
3. Invoke the Hallucination Detection intrinsic by generating in the `hallucination` role (use "hallucination" as the role in the chat template, or simply append `<|start_of_role|>hallucination<|end_of_role|>` and continue generating), see examples below.
|
75 |
4. The model will respond with `Y` or `N`.
|
76 |
|
77 |
-
**Safety Intrinsic Usage Steps** Determining if a user query is safe proceeds as follows.
|
78 |
1. Prompt the model with the system prompt (required) followed by the user prompt.
|
79 |
-
2. Invoke the Safety intrinsic by generating in the `safety` role (use "safety" as the role in the chat template, or simply append `<|start_of_role|>safety<|end_of_role|>` and continue generating), see examples below.
|
80 |
-
3. The model will respond with `Y` (
|
81 |
|
82 |
## Combining Intrinsics
|
83 |
In many pipelines, it may be desirable to invoke multiple intrinsics at different points. In a multi-turn conversation possibly involving other intrinsics, it is important to use
|
84 |
attention masking to provide only the relevant information to the intrinsic of interest. We explore two frameworks for accomplishing this - [Prompt Declaration Language](https://github.com/IBM/prompt-declaration-language) (PDL) and SGLang.
|
85 |
|
86 |
In the examples below, we explore the following RAG flow. First, a user query is provided with
|
87 |
-
relevant documents provided by a RAG system. We can invoke the Safety intrinsic to determine if the query is safe. If it is safe, we can proceed to generate an answer to the question as normal. Finally,
|
88 |
we can evaluate the certainty and hallucination status of this reply by invoking the Uncertainty and Hallucination Detection intrinsics.
|
89 |
|
90 |

|
@@ -323,12 +323,17 @@ red-teamed examples.
|
|
323 |
## Evaluation
|
324 |
We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
|
325 |
|
326 |
-
We first
|
327 |
-
|
|
|
328 |
|
329 |
|
330 |
-

|
367 |
* [QuAC](https://huggingface.co/datasets/allenai/quac)
|
368 |
|
369 |
-
### Safety Training Data
|
370 |
The following public datasets were used for finetuning.
|
371 |
|
372 |
* [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
|
|
|
17 |
## Model Summary
|
18 |
|
19 |
**Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is a LoRA adapter for [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct),
|
20 |
+
providing access to the Uncertainty, Hallucination Detection, and Safety Exception intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model.
|
21 |
|
22 |
It follows the same training pipeline as [ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1](https://huggingface.co/ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1), updated for Granite 3.1.
|
23 |
|
|
|
38 |
### Hallucination Detection (RAG) Intrinsic
|
39 |
The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
|
40 |
|
41 |
+
### Safety Exception Intrinsic
|
42 |
+
The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
|
43 |
+
The Safety Exception intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
|
44 |
|
45 |
|
46 |
## Usage
|
|
|
74 |
3. Invoke the Hallucination Detection intrinsic by generating in the `hallucination` role (use "hallucination" as the role in the chat template, or simply append `<|start_of_role|>hallucination<|end_of_role|>` and continue generating), see examples below.
|
75 |
4. The model will respond with `Y` or `N`.
|
76 |
|
77 |
+
**Safety Exception Intrinsic Usage Steps** Determining if a user query is safe proceeds as follows.
|
78 |
1. Prompt the model with the system prompt (required) followed by the user prompt.
|
79 |
+
2. Invoke the Safety Exception intrinsic by generating in the `safety` role (use "safety" as the role in the chat template, or simply append `<|start_of_role|>safety<|end_of_role|>` and continue generating), see examples below.
|
80 |
+
3. The model will respond with `Y` (unsafe) or `N` (safe).
|
81 |
|
82 |
## Combining Intrinsics
|
83 |
In many pipelines, it may be desirable to invoke multiple intrinsics at different points. In a multi-turn conversation possibly involving other intrinsics, it is important to use
|
84 |
attention masking to provide only the relevant information to the intrinsic of interest. We explore two frameworks for accomplishing this - [Prompt Declaration Language](https://github.com/IBM/prompt-declaration-language) (PDL) and SGLang.
|
85 |
|
86 |
In the examples below, we explore the following RAG flow. First, a user query is provided with
|
87 |
+
relevant documents provided by a RAG system. We can invoke the Safety Exception intrinsic to determine if the query is safe. If it is safe, we can proceed to generate an answer to the question as normal. Finally,
|
88 |
we can evaluate the certainty and hallucination status of this reply by invoking the Uncertainty and Hallucination Detection intrinsics.
|
89 |
|
90 |

|
|
|
323 |
## Evaluation
|
324 |
We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
|
325 |
|
326 |
+
We first find that the performance of the intrinsics in our shared model **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is not degraded
|
327 |
+
versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
|
328 |
+
binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
|
329 |
|
330 |
|
331 |
+

|
332 |
|
333 |
+
We then find that RAG performance of **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
|
334 |
+
|
335 |
+
|
336 |
+

|
337 |
|
338 |
## Training Details
|
339 |
The **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
|
|
|
371 |
* [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
|
372 |
* [QuAC](https://huggingface.co/datasets/allenai/quac)
|
373 |
|
374 |
+
### Safety Exception Training Data
|
375 |
The following public datasets were used for finetuning.
|
376 |
|
377 |
* [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
|
config.json
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/proj/dmfexp/statllm/users/kgreenewald/models/granite-3.1-8b-instruct-r241212a",
|
3 |
+
"architectures": [
|
4 |
+
"GraniteForCausalLM"
|
5 |
+
],
|
6 |
+
"attention_bias": false,
|
7 |
+
"attention_dropout": 0.1,
|
8 |
+
"attention_multiplier": 0.0078125,
|
9 |
+
"bos_token_id": 0,
|
10 |
+
"embedding_multiplier": 12.0,
|
11 |
+
"eos_token_id": 0,
|
12 |
+
"hidden_act": "silu",
|
13 |
+
"hidden_size": 4096,
|
14 |
+
"initializer_range": 0.02,
|
15 |
+
"intermediate_size": 12800,
|
16 |
+
"logits_scaling": 16.0,
|
17 |
+
"max_position_embeddings": 131072,
|
18 |
+
"mlp_bias": false,
|
19 |
+
"model_type": "granite",
|
20 |
+
"num_attention_heads": 32,
|
21 |
+
"num_hidden_layers": 40,
|
22 |
+
"num_key_value_heads": 8,
|
23 |
+
"pad_token_id": 0,
|
24 |
+
"residual_multiplier": 0.22,
|
25 |
+
"rms_norm_eps": 1e-05,
|
26 |
+
"rope_scaling": null,
|
27 |
+
"rope_theta": 10000000.0,
|
28 |
+
"tie_word_embeddings": true,
|
29 |
+
"torch_dtype": "float32",
|
30 |
+
"transformers_version": "4.47.0",
|
31 |
+
"use_cache": true,
|
32 |
+
"vocab_size": 49155
|
33 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 0,
|
4 |
+
"eos_token_id": 0,
|
5 |
+
"pad_token_id": 0,
|
6 |
+
"transformers_version": "4.47.0"
|
7 |
+
}
|
model-00001-of-00007.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:33e4168c3726875ca08406adf659bcba6381b67a070b40ecec49e9bad5a27f2e
|
3 |
+
size 4957886080
|
model-00002-of-00007.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:edebb49e8ab624ecb55eabb64a1859c180b3450b24757bb29fb0f261c41bfd9f
|
3 |
+
size 4991424704
|
model-00003-of-00007.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3e2216e296a2eb6f6c46faefa2a02fcc7bd5aab9a7be15ccc4e7884ea7f3cdc0
|
3 |
+
size 4991424744
|
model-00004-of-00007.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:27cb10255b012b39fcfe09f1b9c73dfd8ce9a25ad5b74f9acf9cfd8ced178e26
|
3 |
+
size 4991457736
|
model-00005-of-00007.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:02268addb3e860db128379dcf871f7f7f05e7812e9d79fb31ac18058715b7756
|
3 |
+
size 4949482056
|
model-00006-of-00007.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e61c3ef6e5da81681dd44a46aa46c404a8c41a54ca7eb55393db4d3c521f91cf
|
3 |
+
size 4991424744
|
model-00007-of-00007.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:072a222324c4be093afc0eada2d9de6d522dcbf66918242b6d6261a57c50a79e
|
3 |
+
size 2810334824
|
model.safetensors.index.json
ADDED
@@ -0,0 +1,369 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 32683393024
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"model.embed_tokens.weight": "model-00001-of-00007.safetensors",
|
7 |
+
"model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
8 |
+
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
9 |
+
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
10 |
+
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
11 |
+
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
12 |
+
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
13 |
+
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
14 |
+
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
15 |
+
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
16 |
+
"model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
17 |
+
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
18 |
+
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
19 |
+
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
20 |
+
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
21 |
+
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
22 |
+
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
23 |
+
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
24 |
+
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
25 |
+
"model.layers.10.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
26 |
+
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
27 |
+
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
28 |
+
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
29 |
+
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
30 |
+
"model.layers.10.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
31 |
+
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
32 |
+
"model.layers.10.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
33 |
+
"model.layers.10.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
34 |
+
"model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
35 |
+
"model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
36 |
+
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
37 |
+
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
38 |
+
"model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
39 |
+
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
40 |
+
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
41 |
+
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
42 |
+
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
43 |
+
"model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
44 |
+
"model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
45 |
+
"model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
46 |
+
"model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
47 |
+
"model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
48 |
+
"model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
49 |
+
"model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
50 |
+
"model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
51 |
+
"model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
52 |
+
"model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
53 |
+
"model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
54 |
+
"model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
55 |
+
"model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
56 |
+
"model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
57 |
+
"model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
58 |
+
"model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
59 |
+
"model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
60 |
+
"model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
61 |
+
"model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
62 |
+
"model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
63 |
+
"model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
64 |
+
"model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
65 |
+
"model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
66 |
+
"model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
67 |
+
"model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
68 |
+
"model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
69 |
+
"model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
70 |
+
"model.layers.15.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
71 |
+
"model.layers.15.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
72 |
+
"model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
73 |
+
"model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
74 |
+
"model.layers.15.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
75 |
+
"model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
76 |
+
"model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
77 |
+
"model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
78 |
+
"model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
79 |
+
"model.layers.16.input_layernorm.weight": "model-00003-of-00007.safetensors",
|
80 |
+
"model.layers.16.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
|
81 |
+
"model.layers.16.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
82 |
+
"model.layers.16.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
83 |
+
"model.layers.16.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
|
84 |
+
"model.layers.16.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
85 |
+
"model.layers.16.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
86 |
+
"model.layers.16.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
87 |
+
"model.layers.16.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
88 |
+
"model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
89 |
+
"model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
90 |
+
"model.layers.17.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
|
91 |
+
"model.layers.17.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
|
92 |
+
"model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
93 |
+
"model.layers.17.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
|
94 |
+
"model.layers.17.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
|
95 |
+
"model.layers.17.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
|
96 |
+
"model.layers.17.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
|
97 |
+
"model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
98 |
+
"model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
99 |
+
"model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
100 |
+
"model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
101 |
+
"model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
102 |
+
"model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
103 |
+
"model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
104 |
+
"model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
105 |
+
"model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
106 |
+
"model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
107 |
+
"model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
108 |
+
"model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
109 |
+
"model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
110 |
+
"model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
111 |
+
"model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
112 |
+
"model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
113 |
+
"model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
114 |
+
"model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
115 |
+
"model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
116 |
+
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
117 |
+
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
118 |
+
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
119 |
+
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
120 |
+
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
121 |
+
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
122 |
+
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
123 |
+
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
124 |
+
"model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
125 |
+
"model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
126 |
+
"model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
127 |
+
"model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
128 |
+
"model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
129 |
+
"model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
130 |
+
"model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
131 |
+
"model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
132 |
+
"model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
133 |
+
"model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
134 |
+
"model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
135 |
+
"model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
136 |
+
"model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
137 |
+
"model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
138 |
+
"model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
139 |
+
"model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
140 |
+
"model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
141 |
+
"model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
142 |
+
"model.layers.22.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
143 |
+
"model.layers.22.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
144 |
+
"model.layers.22.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
145 |
+
"model.layers.22.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
146 |
+
"model.layers.22.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
147 |
+
"model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
148 |
+
"model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
149 |
+
"model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
150 |
+
"model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
151 |
+
"model.layers.23.input_layernorm.weight": "model-00004-of-00007.safetensors",
|
152 |
+
"model.layers.23.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
|
153 |
+
"model.layers.23.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
|
154 |
+
"model.layers.23.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
|
155 |
+
"model.layers.23.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
|
156 |
+
"model.layers.23.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
|
157 |
+
"model.layers.23.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
|
158 |
+
"model.layers.23.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
|
159 |
+
"model.layers.23.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
|
160 |
+
"model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
161 |
+
"model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
162 |
+
"model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
163 |
+
"model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
164 |
+
"model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
165 |
+
"model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
166 |
+
"model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
167 |
+
"model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
168 |
+
"model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
169 |
+
"model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
170 |
+
"model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
171 |
+
"model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
172 |
+
"model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
173 |
+
"model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
174 |
+
"model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
175 |
+
"model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
176 |
+
"model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
177 |
+
"model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
178 |
+
"model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
179 |
+
"model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
180 |
+
"model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
181 |
+
"model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
182 |
+
"model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
183 |
+
"model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
184 |
+
"model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
185 |
+
"model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
186 |
+
"model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
187 |
+
"model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
188 |
+
"model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
189 |
+
"model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
190 |
+
"model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
191 |
+
"model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
192 |
+
"model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
193 |
+
"model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
194 |
+
"model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
195 |
+
"model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
196 |
+
"model.layers.28.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
197 |
+
"model.layers.28.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
198 |
+
"model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
199 |
+
"model.layers.28.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
200 |
+
"model.layers.28.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
201 |
+
"model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
202 |
+
"model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
203 |
+
"model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
204 |
+
"model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
205 |
+
"model.layers.29.input_layernorm.weight": "model-00005-of-00007.safetensors",
|
206 |
+
"model.layers.29.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
|
207 |
+
"model.layers.29.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
|
208 |
+
"model.layers.29.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
|
209 |
+
"model.layers.29.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
|
210 |
+
"model.layers.29.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
211 |
+
"model.layers.29.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
212 |
+
"model.layers.29.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
213 |
+
"model.layers.29.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
214 |
+
"model.layers.3.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
215 |
+
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
216 |
+
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
217 |
+
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
218 |
+
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
219 |
+
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
220 |
+
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
221 |
+
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
222 |
+
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
223 |
+
"model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
224 |
+
"model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
225 |
+
"model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
226 |
+
"model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
227 |
+
"model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
228 |
+
"model.layers.30.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
|
229 |
+
"model.layers.30.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
|
230 |
+
"model.layers.30.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
|
231 |
+
"model.layers.30.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
|
232 |
+
"model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
233 |
+
"model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
234 |
+
"model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
235 |
+
"model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
236 |
+
"model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
237 |
+
"model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
238 |
+
"model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
239 |
+
"model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
240 |
+
"model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
241 |
+
"model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
242 |
+
"model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
243 |
+
"model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
244 |
+
"model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
245 |
+
"model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
246 |
+
"model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
247 |
+
"model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
248 |
+
"model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
249 |
+
"model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
250 |
+
"model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
251 |
+
"model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
252 |
+
"model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
253 |
+
"model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
254 |
+
"model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
255 |
+
"model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
256 |
+
"model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
257 |
+
"model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
258 |
+
"model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
259 |
+
"model.layers.34.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
260 |
+
"model.layers.34.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
261 |
+
"model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
262 |
+
"model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
263 |
+
"model.layers.34.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
264 |
+
"model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
265 |
+
"model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
266 |
+
"model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
267 |
+
"model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
268 |
+
"model.layers.35.input_layernorm.weight": "model-00006-of-00007.safetensors",
|
269 |
+
"model.layers.35.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
|
270 |
+
"model.layers.35.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
271 |
+
"model.layers.35.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
|
272 |
+
"model.layers.35.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
|
273 |
+
"model.layers.35.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
274 |
+
"model.layers.35.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
275 |
+
"model.layers.35.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
276 |
+
"model.layers.35.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
277 |
+
"model.layers.36.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
278 |
+
"model.layers.36.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
279 |
+
"model.layers.36.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
|
280 |
+
"model.layers.36.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
281 |
+
"model.layers.36.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
282 |
+
"model.layers.36.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
|
283 |
+
"model.layers.36.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
|
284 |
+
"model.layers.36.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
|
285 |
+
"model.layers.36.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
|
286 |
+
"model.layers.37.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
287 |
+
"model.layers.37.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
288 |
+
"model.layers.37.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
|
289 |
+
"model.layers.37.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
290 |
+
"model.layers.37.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
291 |
+
"model.layers.37.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
|
292 |
+
"model.layers.37.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
|
293 |
+
"model.layers.37.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
|
294 |
+
"model.layers.37.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
|
295 |
+
"model.layers.38.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
296 |
+
"model.layers.38.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
297 |
+
"model.layers.38.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
|
298 |
+
"model.layers.38.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
299 |
+
"model.layers.38.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
300 |
+
"model.layers.38.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
|
301 |
+
"model.layers.38.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
|
302 |
+
"model.layers.38.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
|
303 |
+
"model.layers.38.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
|
304 |
+
"model.layers.39.input_layernorm.weight": "model-00007-of-00007.safetensors",
|
305 |
+
"model.layers.39.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
|
306 |
+
"model.layers.39.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
|
307 |
+
"model.layers.39.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
|
308 |
+
"model.layers.39.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
|
309 |
+
"model.layers.39.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
|
310 |
+
"model.layers.39.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
|
311 |
+
"model.layers.39.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
|
312 |
+
"model.layers.39.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
|
313 |
+
"model.layers.4.input_layernorm.weight": "model-00001-of-00007.safetensors",
|
314 |
+
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
|
315 |
+
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
|
316 |
+
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
|
317 |
+
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
|
318 |
+
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
319 |
+
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
320 |
+
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
321 |
+
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
322 |
+
"model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
323 |
+
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
324 |
+
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
325 |
+
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
326 |
+
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
327 |
+
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
|
328 |
+
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
|
329 |
+
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
|
330 |
+
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
|
331 |
+
"model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
332 |
+
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
333 |
+
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
334 |
+
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
335 |
+
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
336 |
+
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
337 |
+
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
338 |
+
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
339 |
+
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
340 |
+
"model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
341 |
+
"model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
342 |
+
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
343 |
+
"model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
344 |
+
"model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
345 |
+
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
346 |
+
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
347 |
+
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
348 |
+
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
349 |
+
"model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
350 |
+
"model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
351 |
+
"model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
352 |
+
"model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
353 |
+
"model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
354 |
+
"model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
355 |
+
"model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
356 |
+
"model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
357 |
+
"model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
358 |
+
"model.layers.9.input_layernorm.weight": "model-00002-of-00007.safetensors",
|
359 |
+
"model.layers.9.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
|
360 |
+
"model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
|
361 |
+
"model.layers.9.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
|
362 |
+
"model.layers.9.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
|
363 |
+
"model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
|
364 |
+
"model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
|
365 |
+
"model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
|
366 |
+
"model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
|
367 |
+
"model.norm.weight": "model-00007-of-00007.safetensors"
|
368 |
+
}
|
369 |
+
}
|