Upload GraniteForCausalLM

Browse files

Files changed (11) hide show

README.md +17 -12
config.json +33 -0
generation_config.json +7 -0
model-00001-of-00007.safetensors +3 -0
model-00002-of-00007.safetensors +3 -0
model-00003-of-00007.safetensors +3 -0
model-00004-of-00007.safetensors +3 -0
model-00005-of-00007.safetensors +3 -0
model-00006-of-00007.safetensors +3 -0
model-00007-of-00007.safetensors +3 -0
model.safetensors.index.json +369 -0

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ keep an eye out for feedback and questions in the [Community section](https://hu
 ## Model Summary
 **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is a LoRA adapter for [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct),
-providing access to the Uncertainty, Hallucination Detection, and Safety intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model.
 It follows the same training pipeline as [ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1](https://huggingface.co/ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1), updated for Granite 3.1.
@@ -38,9 +38,9 @@ This percentage is *calibrated* in the following sense: given a set of answers a
 ### Hallucination Detection (RAG) Intrinsic
 The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
-### Safety Intrinsic
-The Safety Intrinsic is designed to raise an exception when the user query is unsafe. This intrinsic responds with `Y` (safe), and `N` otherwise.
-The Safety intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
 ## Usage
@@ -74,17 +74,17 @@ You can further augment this system prompts for a given use case or task, but it
 3. Invoke the Hallucination Detection intrinsic by generating in the `hallucination` role (use "hallucination" as the role in the chat template, or simply append `<|start_of_role|>hallucination<|end_of_role|>` and continue generating), see examples below.
 4. The model will respond with `Y` or `N`.
-**Safety Intrinsic Usage Steps** Determining if a user query is safe proceeds as follows.
 1. Prompt the model with the system prompt (required) followed by the user prompt.
-2. Invoke the Safety intrinsic by generating in the `safety` role (use "safety" as the role in the chat template, or simply append `<|start_of_role|>safety<|end_of_role|>` and continue generating), see examples below.
-3. The model will respond with `Y` (safe) or `N` (unsafe).
 ## Combining Intrinsics
 In many pipelines, it may be desirable to invoke multiple intrinsics at different points. In a multi-turn conversation possibly involving other intrinsics, it is important to use
 attention masking to provide only the relevant information to the intrinsic of interest. We explore two frameworks for accomplishing this - [Prompt Declaration Language](https://github.com/IBM/prompt-declaration-language) (PDL) and SGLang.
 In the examples below, we explore the following RAG flow. First, a user query is provided with
-relevant documents provided by a RAG system. We can invoke the Safety intrinsic to determine if the query is safe. If it is safe, we can proceed to generate an answer to the question as normal. Finally,
 we can evaluate the certainty and hallucination status of this reply by invoking the Uncertainty and Hallucination Detection intrinsics.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
@@ -323,12 +323,17 @@ red-teamed examples.
 ## Evaluation
 We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
-We first benchmark the performance of the intrinsics in our shared model **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1**. Here, percent error is shown for the Hallucination Detection and Safety intrinsics as they have
-binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 100 sample validation set from each intrinsic's dataset.
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/gGrwuUiYiePVWnJYhmM6u.png)
 ## Training Details
 The **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
@@ -366,7 +371,7 @@ For creating the hallucination labels for responses, the technique available at
 * [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
 * [QuAC](https://huggingface.co/datasets/allenai/quac)
-### Safety Training Data
 The following public datasets were used for finetuning.
 * [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)

 ## Model Summary
 **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is a LoRA adapter for [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct),
+providing access to the Uncertainty, Hallucination Detection, and Safety Exception intrinsics in addition to retaining the full abilities of the [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct) model.
 It follows the same training pipeline as [ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1](https://huggingface.co/ibm-granite/granite-intrinsics-3.0-8b-lora-v0.1), updated for Granite 3.1.
 ### Hallucination Detection (RAG) Intrinsic
 The Hallucination Detection intrinsic is designed to detect when an assistant response to a user question with supporting documents is not supported by those documents. Response with a `Y` indicates hallucination, and `N` no hallucination.
+### Safety Exception Intrinsic
+The Safety Exception Intrinsic is designed to raise an exception when the user query is unsafe. This exception is raised by responding with `Y` (unsafe), and `N` otherwise.
+The Safety Exception intrinsic was designed as a binary classifier that analyses the user’s prompt to detect a variety of harms that include: violence, threats, sexual and explicit content and requests to obtain private identifiable information.
 ## Usage
 3. Invoke the Hallucination Detection intrinsic by generating in the `hallucination` role (use "hallucination" as the role in the chat template, or simply append `<|start_of_role|>hallucination<|end_of_role|>` and continue generating), see examples below.
 4. The model will respond with `Y` or `N`.
+**Safety Exception Intrinsic Usage Steps** Determining if a user query is safe proceeds as follows.
 1. Prompt the model with the system prompt (required) followed by the user prompt.
+2. Invoke the Safety Exception intrinsic by generating in the `safety` role (use "safety" as the role in the chat template, or simply append `<|start_of_role|>safety<|end_of_role|>` and continue generating), see examples below.
+3. The model will respond with `Y` (unsafe) or `N` (safe).
 ## Combining Intrinsics
 In many pipelines, it may be desirable to invoke multiple intrinsics at different points. In a multi-turn conversation possibly involving other intrinsics, it is important to use
 attention masking to provide only the relevant information to the intrinsic of interest. We explore two frameworks for accomplishing this - [Prompt Declaration Language](https://github.com/IBM/prompt-declaration-language) (PDL) and SGLang.
 In the examples below, we explore the following RAG flow. First, a user query is provided with
+relevant documents provided by a RAG system. We can invoke the Safety Exception intrinsic to determine if the query is safe. If it is safe, we can proceed to generate an answer to the question as normal. Finally,
 we can evaluate the certainty and hallucination status of this reply by invoking the Uncertainty and Hallucination Detection intrinsics.
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/HpitI-3zeutXqduC2eUES.png)
 ## Evaluation
 We evaluate the performance of the intrinsics themselves and the RAG performance of the model.
+We first find that the performance of the intrinsics in our shared model **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** is not degraded
+versus the baseline procedure of maintaining 3 separate instrinsic models. Here, percent error is shown for the Hallucination Detection and Safety Exception intrinsics as they have
+binary output, and Mean Absolute Error (MAE) is shown for the Uncertainty Intrinsic as it outputs numbers 0 to 9. For all, lower is better. Performance is calculated on a randomly drawn 400 sample validation set from each intrinsic's dataset.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/NsvMpweFjmjIhWFaKtI-K.png)
+We then find that RAG performance of **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** does not suffer with respect to the base model [ibm-granite/granite-3.0-8b-instruct](https://huggingface.co/ibm-granite/granite-3.0-8b-instruct). Here we evaluate the RAGBench benchmark on RAGAS faithfulness and correction metrics.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6602ffd971410cf02bf42c06/hyOlQmXPirlCYeILLBXhc.png)
 ## Training Details
 The **Granite 3.1 8B Instruct - Intrinsics LoRA v0.1** model is a LoRA adapter finetuned to provide 3 desired intrinsic outputs - Uncertainty Quantification, Hallucination Detection, and Safety.
 * [MultiDoc2Dial](https://huggingface.co/datasets/IBM/multidoc2dial)
 * [QuAC](https://huggingface.co/datasets/allenai/quac)
+### Safety Exception Training Data
 The following public datasets were used for finetuning.
 * [yahma/alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned/discussions)

config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "_name_or_path": "/proj/dmfexp/statllm/users/kgreenewald/models/granite-3.1-8b-instruct-r241212a",
+  "architectures": [
+    "GraniteForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.1,
+  "attention_multiplier": 0.0078125,
+  "bos_token_id": 0,
+  "embedding_multiplier": 12.0,
+  "eos_token_id": 0,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 12800,
+  "logits_scaling": 16.0,
+  "max_position_embeddings": 131072,
+  "mlp_bias": false,
+  "model_type": "granite",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 40,
+  "num_key_value_heads": 8,
+  "pad_token_id": 0,
+  "residual_multiplier": 0.22,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": null,
+  "rope_theta": 10000000.0,
+  "tie_word_embeddings": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.47.0",
+  "use_cache": true,
+  "vocab_size": 49155
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 0,
+  "pad_token_id": 0,
+  "transformers_version": "4.47.0"
+}

model-00001-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:33e4168c3726875ca08406adf659bcba6381b67a070b40ecec49e9bad5a27f2e
+size 4957886080

model-00002-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:edebb49e8ab624ecb55eabb64a1859c180b3450b24757bb29fb0f261c41bfd9f
+size 4991424704

model-00003-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3e2216e296a2eb6f6c46faefa2a02fcc7bd5aab9a7be15ccc4e7884ea7f3cdc0
+size 4991424744

model-00004-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27cb10255b012b39fcfe09f1b9c73dfd8ce9a25ad5b74f9acf9cfd8ced178e26
+size 4991457736

model-00005-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:02268addb3e860db128379dcf871f7f7f05e7812e9d79fb31ac18058715b7756
+size 4949482056

model-00006-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e61c3ef6e5da81681dd44a46aa46c404a8c41a54ca7eb55393db4d3c521f91cf
+size 4991424744

model-00007-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:072a222324c4be093afc0eada2d9de6d522dcbf66918242b6d6261a57c50a79e
+size 2810334824

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,369 @@

+{
+  "metadata": {
+    "total_size": 32683393024
+  },
+  "weight_map": {
+    "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.input_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.input_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.35.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.36.input_layernorm.weight": "model-00007-of-00007.safetensors",
+    "model.layers.36.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.36.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.36.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.36.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
+    "model.layers.36.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.36.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.36.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.36.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
+    "model.layers.37.input_layernorm.weight": "model-00007-of-00007.safetensors",
+    "model.layers.37.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.37.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.37.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.37.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
+    "model.layers.37.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.37.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.37.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.37.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.input_layernorm.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.38.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.input_layernorm.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.39.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
+    "model.norm.weight": "model-00007-of-00007.safetensors"
+  }
+}