MoLA-LLM
/

MoLA-v0.6-9x4b

@@ -12,18 +12,14 @@ language:
 - en
 pipeline_tag: text-generation
 ---
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/630f3e4002ce39336c411048/3gVVmArsXVoogpkXvsBs7.png)
 # MoLA-LM: Mixture of LoRA Adapters LLM
 MoLA-LM combines multiple LoRA adapters with an intelligent router to automatically select the best adapter for each input prompt. This approach enables specialized performance across different tasks while maintaining efficiency.
-[**Click for evals**](https://github.com/alkinun/MoLA/blob/main/README.md)
-**Important Note**: *The v0.5 had issues with the lora applying part of the custom lm class and its router was a bit too small with little generalization.
-In v0.6 and future models, all of these issues are/will be resolved.*
-**TLDR:** *Dont use v0.5, use v0.6 and above.*
 ## Model Details
@@ -36,6 +32,7 @@ In v0.6 and future models, all of these issues are/will be resolved.*
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load the model (trust_remote_code=True is required for custom architecture)
 model = AutoModelForCausalLM.from_pretrained(
     "MoLA-LLM/MoLA-v0.6-9x4b",
@@ -43,6 +40,7 @@ model = AutoModelForCausalLM.from_pretrained(
     device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained("MoLA-LLM/MoLA-v0.6-9x4b", trust_remote_code=True)
 # Use like any other language model - adapter selection is automatic
 prompt = "Write a Python function to calculate fibonacci numbers"
 messages = [{"role": "user", "content": prompt}]
@@ -53,8 +51,10 @@ inputs = tokenizer.apply_chat_template(
     return_dict=True,
     return_tensors="pt",
 ).to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=8192, temperature=.6, do_sample=True)
 response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
 print(f"Selected LoRA: {model.get_current_lora()}")
 print(response)
 ```
@@ -65,7 +65,7 @@ print(response)
 The MoLA-LM architecture consists of:
 1. **Base Model**: Qwen/Qwen3-4B-Thinking-2507
-2. **Router Network**: Frozen encoder as Sentence transformer + decoder as MLP for adapter selection
 3. **LoRA Adapters**: 9 task-specific fine-tuned adapters
 4. **Dynamic Switching**: Automatic adapter application based on input

 - en
 pipeline_tag: text-generation
 ---
+Image here
 # MoLA-LM: Mixture of LoRA Adapters LLM
 MoLA-LM combines multiple LoRA adapters with an intelligent router to automatically select the best adapter for each input prompt. This approach enables specialized performance across different tasks while maintaining efficiency.
+Evals are coming...
 ## Model Details
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load the model (trust_remote_code=True is required for custom architecture)
 model = AutoModelForCausalLM.from_pretrained(
     "MoLA-LLM/MoLA-v0.6-9x4b",
     device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained("MoLA-LLM/MoLA-v0.6-9x4b", trust_remote_code=True)
 # Use like any other language model - adapter selection is automatic
 prompt = "Write a Python function to calculate fibonacci numbers"
 messages = [{"role": "user", "content": prompt}]
     return_dict=True,
     return_tensors="pt",
 ).to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=8192, temperature=.6, do_sample=True)
 response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
 print(f"Selected LoRA: {model.get_current_lora()}")
 print(response)
 ```
 The MoLA-LM architecture consists of:
 1. **Base Model**: Qwen/Qwen3-4B-Thinking-2507
+2. **Router Network**: Frozen encoder as Sentence transformer + decoder as one layer MLP for adapter selection
 3. **LoRA Adapters**: 9 task-specific fine-tuned adapters
 4. **Dynamic Switching**: Automatic adapter application based on input

modeling_mola_lm.py CHANGED Viewed

@@ -226,7 +226,16 @@ class MoLAForCausalLM(PreTrainedModel, GenerationMixin):
                         repo_id=self.model_path,
                         filename=f"loras/{first_adapter}/adapter_config.json"
                     )
-                    first_lora_path = os.path.dirname(adapter_weights_file)
                     print(f"Downloaded first adapter to: {first_lora_path}")
                 except Exception as e:
                     raise Exception(f"Failed to download first adapter {first_adapter}: {e}")
@@ -261,7 +270,16 @@ class MoLAForCausalLM(PreTrainedModel, GenerationMixin):
                                 repo_id=self.model_path,
                                 filename=f"loras/{task_name}/adapter_config.json"
                             )
-                            lora_path = os.path.dirname(adapter_weights_file)
                         except Exception as e:
                             print(f"❌ Failed to download LoRA {task_name}: {e}")
                             continue

                         repo_id=self.model_path,
                         filename=f"loras/{first_adapter}/adapter_config.json"
                     )
+                    # Create a temporary directory with both files for PEFT
+                    temp_dir = tempfile.mkdtemp()
+                    first_lora_path = os.path.join(temp_dir, first_adapter)
+                    os.makedirs(first_lora_path, exist_ok=True)
+                    # Copy both files to the same directory
+                    shutil.copy2(adapter_weights_file, os.path.join(first_lora_path, "adapter_model.safetensors"))
+                    shutil.copy2(adapter_config_file, os.path.join(first_lora_path, "adapter_config.json"))
                     print(f"Downloaded first adapter to: {first_lora_path}")
                 except Exception as e:
                     raise Exception(f"Failed to download first adapter {first_adapter}: {e}")
                                 repo_id=self.model_path,
                                 filename=f"loras/{task_name}/adapter_config.json"
                             )
+                            # Create a temporary directory with both files for PEFT
+                            temp_dir = tempfile.mkdtemp()
+                            lora_path = os.path.join(temp_dir, task_name)
+                            os.makedirs(lora_path, exist_ok=True)
+                            # Copy both files to the same directory
+                            shutil.copy2(adapter_weights_file, os.path.join(lora_path, "adapter_model.safetensors"))
+                            shutil.copy2(adapter_config_file, os.path.join(lora_path, "adapter_config.json"))
                         except Exception as e:
                             print(f"❌ Failed to download LoRA {task_name}: {e}")
                             continue