MoLA-LLM
/

MoLA-v0.6-9x4b

@@ -13,18 +13,13 @@ language:
 pipeline_tag: text-generation
 ---
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/630f3e4002ce39336c411048/3gVVmArsXVoogpkXvsBs7.png)
 # MoLA-LM: Mixture of LoRA Adapters LLM
 MoLA-LM combines multiple LoRA adapters with an intelligent router to automatically select the best adapter for each input prompt. This approach enables specialized performance across different tasks while maintaining efficiency.
-[**Click for evals**](https://github.com/alkinun/MoLA/blob/main/README.md)
-**Important Note**: *The v0.5 had issues with the lora applying part of the custom lm class and its router was a bit too small with little generalization.
-In v0.6 and future models, all of these issues are/will be resolved.*
-**TLDR:** *Dont use v0.5, use v0.6 and above.*
 ## Model Details
@@ -37,6 +32,7 @@ In v0.6 and future models, all of these issues are/will be resolved.*
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load the model (trust_remote_code=True is required for custom architecture)
 model = AutoModelForCausalLM.from_pretrained(
     "MoLA-LLM/MoLA-v0.6-9x4b",
@@ -44,6 +40,7 @@ model = AutoModelForCausalLM.from_pretrained(
     device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained("MoLA-LLM/MoLA-v0.6-9x4b", trust_remote_code=True)
 # Use like any other language model - adapter selection is automatic
 prompt = "Write a Python function to calculate fibonacci numbers"
 messages = [{"role": "user", "content": prompt}]
@@ -54,8 +51,10 @@ inputs = tokenizer.apply_chat_template(
     return_dict=True,
     return_tensors="pt",
 ).to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=8192, temperature=.6, do_sample=True)
 response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
 print(f"Selected LoRA: {model.get_current_lora()}")
 print(response)
 ```
@@ -66,7 +65,7 @@ print(response)
 The MoLA-LM architecture consists of:
 1. **Base Model**: Qwen/Qwen3-4B-Thinking-2507
-2. **Router Network**: Frozen encoder as Sentence transformer + decoder as MLP for adapter selection
 3. **LoRA Adapters**: 9 task-specific fine-tuned adapters
 4. **Dynamic Switching**: Automatic adapter application based on input

 pipeline_tag: text-generation
 ---
+Image here
 # MoLA-LM: Mixture of LoRA Adapters LLM
 MoLA-LM combines multiple LoRA adapters with an intelligent router to automatically select the best adapter for each input prompt. This approach enables specialized performance across different tasks while maintaining efficiency.
+Evals are coming...
 ## Model Details
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load the model (trust_remote_code=True is required for custom architecture)
 model = AutoModelForCausalLM.from_pretrained(
     "MoLA-LLM/MoLA-v0.6-9x4b",
     device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained("MoLA-LLM/MoLA-v0.6-9x4b", trust_remote_code=True)
 # Use like any other language model - adapter selection is automatic
 prompt = "Write a Python function to calculate fibonacci numbers"
 messages = [{"role": "user", "content": prompt}]
     return_dict=True,
     return_tensors="pt",
 ).to(model.device)
 outputs = model.generate(**inputs, max_new_tokens=8192, temperature=.6, do_sample=True)
 response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
 print(f"Selected LoRA: {model.get_current_lora()}")
 print(response)
 ```
 The MoLA-LM architecture consists of:
 1. **Base Model**: Qwen/Qwen3-4B-Thinking-2507
+2. **Router Network**: Frozen encoder as Sentence transformer + decoder as one layer MLP for adapter selection
 3. **LoRA Adapters**: 9 task-specific fine-tuned adapters
 4. **Dynamic Switching**: Automatic adapter application based on input

modeling_mola_lm.py CHANGED Viewed

@@ -194,64 +194,29 @@ class MoLAForCausalLM(PreTrainedModel, GenerationMixin):
             raise ImportError(f"Required dependencies not found: {e}")
     def _load_lora_adapters(self):
-        """Load LoRA adapters using PEFT (single wrapper, multiple adapters)."""
-        from huggingface_hub import hf_hub_download
         if not self.model_path:
             print("No model path specified, skipping LoRA loading")
             return
-        print("Loading LoRA adapters (single wrapper)...")
-        # Get the first adapter to create the initial PEFT wrapper
-        first_adapter = str(self.config.task_labels[0])
-        first_lora_path = None
         try:
-            # Handle both local and Hub paths for first adapter
-            if os.path.exists(self.model_path):
-                # Local path
-                first_lora_path = os.path.join(self.model_path, "loras", first_adapter)
-                if not os.path.exists(first_lora_path):
-                    raise FileNotFoundError(f"First adapter directory not found: {first_lora_path}")
-            else:
-                # Hub path - download first adapter
-                try:
-                    # Download both required files for first adapter
-                    adapter_weights_file = hf_hub_download(
-                        repo_id=self.model_path,
-                        filename=f"loras/{first_adapter}/adapter_model.safetensors"
-                    )
-                    adapter_config_file = hf_hub_download(
-                        repo_id=self.model_path,
-                        filename=f"loras/{first_adapter}/adapter_config.json"
-                    )
-                    # Create a temporary directory with both files for PEFT
-                    temp_dir = tempfile.mkdtemp()
-                    first_lora_path = os.path.join(temp_dir, first_adapter)
-                    os.makedirs(first_lora_path, exist_ok=True)
-                    # Copy both files to the same directory
-                    shutil.copy2(adapter_weights_file, os.path.join(first_lora_path, "adapter_model.safetensors"))
-                    shutil.copy2(adapter_config_file, os.path.join(first_lora_path, "adapter_config.json"))
-                    print(f"Downloaded first adapter to: {first_lora_path}")
-                except Exception as e:
-                    raise Exception(f"Failed to download first adapter {first_adapter}: {e}")
-            # Create the initial PEFT wrapper WITHOUT specifying adapter_name to use default
-            peft_model = PeftModel.from_pretrained(
-                self.mola_model,
-                first_lora_path,
-                torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
-            )
-            print(f"✅ Loaded first LoRA: {first_adapter} (as default)")
-            # Load remaining adapters into the same wrapper with unique names
-            for task_name in self.config.task_labels[1:]:
                 try:
-                    lora_path = None
                     if os.path.exists(self.model_path):
                         # Local path
@@ -260,45 +225,53 @@ class MoLAForCausalLM(PreTrainedModel, GenerationMixin):
                             print(f"⚠️ LoRA directory not found: {lora_path}")
                             continue
                     else:
-                        # Hub path - download adapter
-                        try:
-                            adapter_weights_file = hf_hub_download(
-                                repo_id=self.model_path,
-                                filename=f"loras/{task_name}/adapter_model.safetensors"
-                            )
-                            adapter_config_file = hf_hub_download(
-                                repo_id=self.model_path,
-                                filename=f"loras/{task_name}/adapter_config.json"
-                            )
-                            # Create a temporary directory with both files for PEFT
-                            temp_dir = tempfile.mkdtemp()
-                            lora_path = os.path.join(temp_dir, task_name)
-                            os.makedirs(lora_path, exist_ok=True)
-                            # Copy both files to the same directory
-                            shutil.copy2(adapter_weights_file, os.path.join(lora_path, "adapter_model.safetensors"))
-                            shutil.copy2(adapter_config_file, os.path.join(lora_path, "adapter_config.json"))
-                        except Exception as e:
-                            print(f"❌ Failed to download LoRA {task_name}: {e}")
-                            continue
-                    # Load adapter into the same PEFT model with unique name
-                    peft_model.load_adapter(lora_path, adapter_name=task_name)
-                    print(f"✅ Loaded LoRA: {task_name}")
                 except Exception as e:
                     print(f"❌ Failed to load LoRA {task_name}: {e}")
-            # Store single PEFT model for all adapters
-            self.lora_models = {str(name): peft_model for name in self.config.task_labels}
-            self._current_lora = first_adapter
-            self._current_adapted_model = peft_model
-            print(f"Loaded {len(self.config.task_labels)} LoRA adapters into one PEFT model.")
-            print(f"Available adapter names: {list(peft_model.peft_config.keys())}")
         except Exception as e:
             print(f"❌ Failed to initialize LoRA loading: {e}")
             self.lora_models = {}

             raise ImportError(f"Required dependencies not found: {e}")
     def _load_lora_adapters(self):
+        """Load LoRA adapters using PEFT - simplified approach."""
+        print("Loading LoRA adapters...")
         if not self.model_path:
             print("No model path specified, skipping LoRA loading")
             return
+        # Simple approach: try to load each LoRA directly from Hub using PEFT's built-in capabilities
         try:
+            from huggingface_hub import hf_hub_download
+            import tempfile
+            import shutil
+            # Create a working directory for all LoRAs
+            work_dir = tempfile.mkdtemp(prefix="mola_loras_")
+            print(f"Working directory: {work_dir}")
+            peft_model = None
+            loaded_adapters = []
+            for i, task_name in enumerate(self.config.task_labels):
                 try:
+                    print(f"Loading LoRA {task_name}...")
                     if os.path.exists(self.model_path):
                         # Local path
                             print(f"⚠️ LoRA directory not found: {lora_path}")
                             continue
                     else:
+                        # Hub path - create proper structure
+                        lora_path = os.path.join(work_dir, task_name)
+                        os.makedirs(lora_path, exist_ok=True)
+                        # Download files
+                        weights_file = hf_hub_download(
+                            repo_id=self.model_path,
+                            filename=f"loras/{task_name}/adapter_model.safetensors"
+                        )
+                        config_file = hf_hub_download(
+                            repo_id=self.model_path,
+                            filename=f"loras/{task_name}/adapter_config.json"
+                        )
+                        # Copy to working directory
+                        shutil.copy2(weights_file, os.path.join(lora_path, "adapter_model.safetensors"))
+                        shutil.copy2(config_file, os.path.join(lora_path, "adapter_config.json"))
+                    # Load the first adapter as base, others as additional
+                    if i == 0:
+                        peft_model = PeftModel.from_pretrained(
+                            self.mola_model,
+                            lora_path,
+                            torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
+                        )
+                        print(f"✅ Loaded base LoRA: {task_name}")
+                    else:
+                        peft_model.load_adapter(lora_path, adapter_name=task_name)
+                        print(f"✅ Loaded additional LoRA: {task_name}")
+                    loaded_adapters.append(task_name)
                 except Exception as e:
                     print(f"❌ Failed to load LoRA {task_name}: {e}")
+                    continue
+            if peft_model and loaded_adapters:
+                # Store the PEFT model
+                self.lora_models = {name: peft_model for name in loaded_adapters}
+                self._current_adapted_model = peft_model
+                self._current_lora = loaded_adapters[0]
+                print(f"✅ Successfully loaded {len(loaded_adapters)} LoRA adapters")
+                print(f"Available adapters: {loaded_adapters}")
+            else:
+                raise Exception("No LoRA adapters could be loaded")
         except Exception as e:
             print(f"❌ Failed to initialize LoRA loading: {e}")
             self.lora_models = {}