kgreenewald commited on
Commit
6535b5b
·
verified ·
1 Parent(s): 37c0383

Upload GraniteForCausalLM

Browse files
README.md CHANGED
@@ -17,7 +17,7 @@ keep an eye out for feedback and questions in the [Community section](https://hu
17
  ## Model Summary
18
 
19
  **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is a LoRA adapter for [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct),
20
- providing access to the Uncertainty, Hallucination Detection, and Safety intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model.
21
 
22
  It follows the same training pipeline as [ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1](https://huggingface.co/ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1), updated for Granite 3.1.
23
 
@@ -38,9 +38,9 @@ This percentage is *calibrated* in the following sense: given a set of answers a
38
  ### Hallucination Detection (RAG) Intrinsic
39
  The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
40
 
41
- ### Safety Intrinsic
42
- The Safety Intrinsic is designed to raise an exception when the user query is unsafe. This intrinsic responds with `Y` (safe), and `N` otherwise.
43
- The Safety intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
44
 
45
 
46
  ## Usage
@@ -74,17 +74,17 @@ You can further augment this system prompts for a given use case or task, but it
74
  3. Invoke the Hallucination Detection intrinsic by generating in the `hallucination` role (use "hallucination" as the role in the chat template, or simply append `<|start_of_role|>hallucination<|end_of_role|>` and continue generating), see examples below.
75
  4. The model will respond with `Y` or `N`.
76
 
77
- **Safety Intrinsic Usage Steps** Determining if a user query is safe proceeds as follows.
78
  1. Prompt the model with the system prompt (required) followed by the user prompt.
79
- 2. Invoke the Safety intrinsic by generating in the `safety` role (use "safety" as the role in the chat template, or simply append `<|start_of_role|>safety<|end_of_role|>` and continue generating), see examples below.
80
- 3. The model will respond with `Y` (safe) or `N` (unsafe).
81
 
82
  ## Combining Intrinsics
83
  In many pipelines, it may be desirable to invoke multiple intrinsics at different points. In a multi-turn conversation possibly involving other intrinsics, it is important to use
84
  attention masking to provide only the relevant information to the intrinsic of interest. We explore two frameworks for accomplishing this - [Prompt Declaration Language](https://github.com/IBM/prompt-declaration-language) (PDL) and SGLang.
85
 
86
  In the examples below, we explore the following RAG flow. First, a user query is provided with
87
- relevant documents provided by a RAG system. We can invoke the Safety intrinsic to determine if the query is safe. If it is safe, we can proceed to generate an answer to the question as normal. Finally,
88
  we can evaluate the certainty and hallucination status of this reply by invoking the Uncertainty and Hallucination Detection intrinsics.
89
 
90
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
@@ -323,12 +323,17 @@ red-teamed examples.
323
  ## Evaluation
324
  We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
325
 
326
- We first benchmark the performance of the intrinsics in our shared model **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1**. Here, percent error is shown for the Hallucination Detection and Safety intrinsics as they have
327
- binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 100 sample validation set from each intrinsic's dataset.
 
328
 
329
 
330
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/gGrwuUiYiePVWnJYhmM6u.png)
331
 
 
 
 
 
332
 
333
  ## Training Details
334
  The **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
@@ -366,7 +371,7 @@ For creating the hallucination labels for responses, the technique available at
366
  * [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
367
  * [QuAC](https://huggingface.co/datasets/allenai/quac)
368
 
369
- ### Safety Training Data
370
  The following public datasets were used for finetuning.
371
 
372
  * [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
 
17
  ## Model Summary
18
 
19
  **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is a LoRA adapter for [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct),
20
+ providing access to the Uncertainty, Hallucination Detection, and Safety Exception intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model.
21
 
22
  It follows the same training pipeline as [ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1](https://huggingface.co/ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1), updated for Granite 3.1.
23
 
 
38
  ### Hallucination Detection (RAG) Intrinsic
39
  The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
40
 
41
+ ### Safety Exception Intrinsic
42
+ The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
43
+ The Safety Exception intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
44
 
45
 
46
  ## Usage
 
74
  3. Invoke the Hallucination Detection intrinsic by generating in the `hallucination` role (use "hallucination" as the role in the chat template, or simply append `<|start_of_role|>hallucination<|end_of_role|>` and continue generating), see examples below.
75
  4. The model will respond with `Y` or `N`.
76
 
77
+ **Safety Exception Intrinsic Usage Steps** Determining if a user query is safe proceeds as follows.
78
  1. Prompt the model with the system prompt (required) followed by the user prompt.
79
+ 2. Invoke the Safety Exception intrinsic by generating in the `safety` role (use "safety" as the role in the chat template, or simply append `<|start_of_role|>safety<|end_of_role|>` and continue generating), see examples below.
80
+ 3. The model will respond with `Y` (unsafe) or `N` (safe).
81
 
82
  ## Combining Intrinsics
83
  In many pipelines, it may be desirable to invoke multiple intrinsics at different points. In a multi-turn conversation possibly involving other intrinsics, it is important to use
84
  attention masking to provide only the relevant information to the intrinsic of interest. We explore two frameworks for accomplishing this - [Prompt Declaration Language](https://github.com/IBM/prompt-declaration-language) (PDL) and SGLang.
85
 
86
  In the examples below, we explore the following RAG flow. First, a user query is provided with
87
+ relevant documents provided by a RAG system. We can invoke the Safety Exception intrinsic to determine if the query is safe. If it is safe, we can proceed to generate an answer to the question as normal. Finally,
88
  we can evaluate the certainty and hallucination status of this reply by invoking the Uncertainty and Hallucination Detection intrinsics.
89
 
90
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
 
323
  ## Evaluation
324
  We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
325
 
326
+ We first find that the performance of the intrinsics in our shared model **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is not degraded
327
+ versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
328
+ binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
329
 
330
 
331
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/NsvMpweFjmjIhWFaKtI-K.png)
332
 
333
+ We then find that RAG performance of **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
334
+
335
+
336
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/hyOlQmXPirlCYeILLBXhc.png)
337
 
338
  ## Training Details
339
  The **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
 
371
  * [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
372
  * [QuAC](https://huggingface.co/datasets/allenai/quac)
373
 
374
+ ### Safety Exception Training Data
375
  The following public datasets were used for finetuning.
376
 
377
  * [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/proj/dmfexp/statllm/users/kgreenewald/models/granite-3.1-8b-instruct-r241212a",
3
+ "architectures": [
4
+ "GraniteForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.1,
8
+ "attention_multiplier": 0.0078125,
9
+ "bos_token_id": 0,
10
+ "embedding_multiplier": 12.0,
11
+ "eos_token_id": 0,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 4096,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 12800,
16
+ "logits_scaling": 16.0,
17
+ "max_position_embeddings": 131072,
18
+ "mlp_bias": false,
19
+ "model_type": "granite",
20
+ "num_attention_heads": 32,
21
+ "num_hidden_layers": 40,
22
+ "num_key_value_heads": 8,
23
+ "pad_token_id": 0,
24
+ "residual_multiplier": 0.22,
25
+ "rms_norm_eps": 1e-05,
26
+ "rope_scaling": null,
27
+ "rope_theta": 10000000.0,
28
+ "tie_word_embeddings": true,
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.47.0",
31
+ "use_cache": true,
32
+ "vocab_size": 49155
33
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.47.0"
7
+ }
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:33e4168c3726875ca08406adf659bcba6381b67a070b40ecec49e9bad5a27f2e
3
+ size 4957886080
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edebb49e8ab624ecb55eabb64a1859c180b3450b24757bb29fb0f261c41bfd9f
3
+ size 4991424704
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e2216e296a2eb6f6c46faefa2a02fcc7bd5aab9a7be15ccc4e7884ea7f3cdc0
3
+ size 4991424744
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27cb10255b012b39fcfe09f1b9c73dfd8ce9a25ad5b74f9acf9cfd8ced178e26
3
+ size 4991457736
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02268addb3e860db128379dcf871f7f7f05e7812e9d79fb31ac18058715b7756
3
+ size 4949482056
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e61c3ef6e5da81681dd44a46aa46c404a8c41a54ca7eb55393db4d3c521f91cf
3
+ size 4991424744
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:072a222324c4be093afc0eada2d9de6d522dcbf66918242b6d6261a57c50a79e
3
+ size 2810334824
model.safetensors.index.json ADDED
@@ -0,0 +1,369 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 32683393024
4
+ },
5
+ "weight_map": {
6
+ "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
7
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
8
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
9
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
10
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
11
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
12
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
13
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
14
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
15
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
16
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
17
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
18
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
19
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
20
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
21
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
22
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
23
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
24
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
25
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00007.safetensors",
26
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
27
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
28
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
29
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
30
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
31
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
32
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
33
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
34
+ "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
35
+ "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
36
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
37
+ "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
38
+ "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
39
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
40
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
41
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
42
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
43
+ "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
44
+ "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
45
+ "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
46
+ "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
47
+ "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
48
+ "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
49
+ "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
50
+ "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
51
+ "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
52
+ "model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
53
+ "model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
54
+ "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
55
+ "model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
56
+ "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
57
+ "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
58
+ "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
59
+ "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
60
+ "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
61
+ "model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
62
+ "model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
63
+ "model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
64
+ "model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
65
+ "model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
66
+ "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
67
+ "model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
68
+ "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
69
+ "model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
70
+ "model.layers.15.input_layernorm.weight": "model-00003-of-00007.safetensors",
71
+ "model.layers.15.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
72
+ "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
73
+ "model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
74
+ "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
75
+ "model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
76
+ "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
77
+ "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
78
+ "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
79
+ "model.layers.16.input_layernorm.weight": "model-00003-of-00007.safetensors",
80
+ "model.layers.16.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
81
+ "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
82
+ "model.layers.16.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
83
+ "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
84
+ "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
85
+ "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
86
+ "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
87
+ "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
88
+ "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
89
+ "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
90
+ "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
91
+ "model.layers.17.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
92
+ "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
93
+ "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
94
+ "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
95
+ "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
96
+ "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
97
+ "model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
98
+ "model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
99
+ "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
100
+ "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
101
+ "model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
102
+ "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
103
+ "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
104
+ "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
105
+ "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
106
+ "model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
107
+ "model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
108
+ "model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
109
+ "model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
110
+ "model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
111
+ "model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
112
+ "model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
113
+ "model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
114
+ "model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
115
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
116
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
117
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
118
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
119
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
120
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
121
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
122
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
123
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
124
+ "model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
125
+ "model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
126
+ "model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
127
+ "model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
128
+ "model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
129
+ "model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
130
+ "model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
131
+ "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
132
+ "model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
133
+ "model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
134
+ "model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
135
+ "model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
136
+ "model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
137
+ "model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
138
+ "model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
139
+ "model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
140
+ "model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
141
+ "model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
142
+ "model.layers.22.input_layernorm.weight": "model-00004-of-00007.safetensors",
143
+ "model.layers.22.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
144
+ "model.layers.22.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
145
+ "model.layers.22.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
146
+ "model.layers.22.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
147
+ "model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
148
+ "model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
149
+ "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
150
+ "model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
151
+ "model.layers.23.input_layernorm.weight": "model-00004-of-00007.safetensors",
152
+ "model.layers.23.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
153
+ "model.layers.23.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
154
+ "model.layers.23.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
155
+ "model.layers.23.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
156
+ "model.layers.23.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
157
+ "model.layers.23.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
158
+ "model.layers.23.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
159
+ "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
160
+ "model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
161
+ "model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
162
+ "model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
163
+ "model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
164
+ "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
165
+ "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
166
+ "model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
167
+ "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
168
+ "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
169
+ "model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
170
+ "model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
171
+ "model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
172
+ "model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
173
+ "model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
174
+ "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
175
+ "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
176
+ "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
177
+ "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
178
+ "model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
179
+ "model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
180
+ "model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
181
+ "model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
182
+ "model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
183
+ "model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
184
+ "model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
185
+ "model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
186
+ "model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
187
+ "model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
188
+ "model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
189
+ "model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
190
+ "model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
191
+ "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
192
+ "model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
193
+ "model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
194
+ "model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
195
+ "model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
196
+ "model.layers.28.input_layernorm.weight": "model-00005-of-00007.safetensors",
197
+ "model.layers.28.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
198
+ "model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
199
+ "model.layers.28.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
200
+ "model.layers.28.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
201
+ "model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
202
+ "model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
203
+ "model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
204
+ "model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
205
+ "model.layers.29.input_layernorm.weight": "model-00005-of-00007.safetensors",
206
+ "model.layers.29.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
207
+ "model.layers.29.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
208
+ "model.layers.29.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
209
+ "model.layers.29.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
210
+ "model.layers.29.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
211
+ "model.layers.29.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
212
+ "model.layers.29.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
213
+ "model.layers.29.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
214
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00007.safetensors",
215
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
216
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
217
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
218
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
219
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
220
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
221
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
222
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
223
+ "model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
224
+ "model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
225
+ "model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
226
+ "model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
227
+ "model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
228
+ "model.layers.30.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
229
+ "model.layers.30.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
230
+ "model.layers.30.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
231
+ "model.layers.30.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
232
+ "model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
233
+ "model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
234
+ "model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
235
+ "model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
236
+ "model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
237
+ "model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
238
+ "model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
239
+ "model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
240
+ "model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
241
+ "model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
242
+ "model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
243
+ "model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
244
+ "model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
245
+ "model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
246
+ "model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
247
+ "model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
248
+ "model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
249
+ "model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
250
+ "model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
251
+ "model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
252
+ "model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
253
+ "model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
254
+ "model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
255
+ "model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
256
+ "model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
257
+ "model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
258
+ "model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
259
+ "model.layers.34.input_layernorm.weight": "model-00006-of-00007.safetensors",
260
+ "model.layers.34.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
261
+ "model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
262
+ "model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
263
+ "model.layers.34.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
264
+ "model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
265
+ "model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
266
+ "model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
267
+ "model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
268
+ "model.layers.35.input_layernorm.weight": "model-00006-of-00007.safetensors",
269
+ "model.layers.35.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
270
+ "model.layers.35.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
271
+ "model.layers.35.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
272
+ "model.layers.35.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
273
+ "model.layers.35.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
274
+ "model.layers.35.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
275
+ "model.layers.35.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
276
+ "model.layers.35.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
277
+ "model.layers.36.input_layernorm.weight": "model-00007-of-00007.safetensors",
278
+ "model.layers.36.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
279
+ "model.layers.36.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
280
+ "model.layers.36.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
281
+ "model.layers.36.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
282
+ "model.layers.36.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
283
+ "model.layers.36.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
284
+ "model.layers.36.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
285
+ "model.layers.36.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
286
+ "model.layers.37.input_layernorm.weight": "model-00007-of-00007.safetensors",
287
+ "model.layers.37.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
288
+ "model.layers.37.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
289
+ "model.layers.37.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
290
+ "model.layers.37.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
291
+ "model.layers.37.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
292
+ "model.layers.37.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
293
+ "model.layers.37.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
294
+ "model.layers.37.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
295
+ "model.layers.38.input_layernorm.weight": "model-00007-of-00007.safetensors",
296
+ "model.layers.38.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
297
+ "model.layers.38.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
298
+ "model.layers.38.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
299
+ "model.layers.38.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
300
+ "model.layers.38.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
301
+ "model.layers.38.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
302
+ "model.layers.38.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
303
+ "model.layers.38.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
304
+ "model.layers.39.input_layernorm.weight": "model-00007-of-00007.safetensors",
305
+ "model.layers.39.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
306
+ "model.layers.39.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
307
+ "model.layers.39.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
308
+ "model.layers.39.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
309
+ "model.layers.39.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
310
+ "model.layers.39.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
311
+ "model.layers.39.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
312
+ "model.layers.39.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
313
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00007.safetensors",
314
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
315
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
316
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
317
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
318
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
319
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
320
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
321
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
322
+ "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
323
+ "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
324
+ "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
325
+ "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
326
+ "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
327
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
328
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
329
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
330
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
331
+ "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
332
+ "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
333
+ "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
334
+ "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
335
+ "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
336
+ "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
337
+ "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
338
+ "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
339
+ "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
340
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
341
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
342
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
343
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
344
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
345
+ "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
346
+ "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
347
+ "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
348
+ "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
349
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
350
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
351
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
352
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
353
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
354
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
355
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
356
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
357
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
358
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00007.safetensors",
359
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
360
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
361
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
362
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
363
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
364
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
365
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
366
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
367
+ "model.norm.weight": "model-00007-of-00007.safetensors"
368
+ }
369
+ }