Manal0809
/

Mistral_calibrative_few

@@ -1,202 +1,105 @@
----
-base_model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
-library_name: peft
----
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
-### Framework versions
-- PEFT 0.15.1

+# Mistral_calibrative_few
+## Model Description
+This model is the few-shot trained calibrative fine-tuned version of Multi-CONFE (Confidence-Aware Medical Feature Extraction), built on [unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit](https://huggingface.co/unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit). It demonstrates exceptional data efficiency by achieving near state-of-the-art performance while training on only 12.5% of the available data, with particular emphasis on confidence calibration and hallucination reduction.
+## Intended Use
+This model is designed for extracting clinically relevant features from medical patient notes with high accuracy and well-calibrated confidence scores in low-resource settings. It's particularly useful for automated assessment of medical documentation, such as USMLE Step-2 Clinical Skills notes, when training data is limited.
+## Training Data
+The model was trained on just 100 annotated patient notes (12.5% of the full dataset) from the [NBME - Score Clinical Patient Notes](https://www.kaggle.com/competitions/nbme-score-clinical-patient-notes) Kaggle competition dataset. This represents approximately 10 examples per clinical case type. The dataset contains USMLE Step-2 Clinical Skills patient notes covering 10 different clinical cases, with each note containing expert annotations for multiple medical features that need to be extracted.
+## Training Procedure
+Training involved a two-phase approach:
+1. **Instructive Few-Shot Fine-Tuning**: Initial alignment of the model with the medical feature extraction task using Mistral Nemo Instruct as the base model.
+2. **Calibrative Fine-Tuning**: Integration of confidence calibration mechanisms, including bidirectional feature mapping, complexity-aware confidence adjustment, and dynamic thresholding.
+Training hyperparameters:
+- Base model: unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
+- LoRA rank: 32
+- Training epochs: 14 (instructive phase) + 5 (calibrative phase)
+- Learning rate: 2e-4 (instructive phase), 1e-4 (calibrative phase)
+- Optimizer: AdamW (8-bit)
+- Hallucination weight: 0.2
+- Missing feature weight: 0.5
+- Confidence threshold: 0.7
+## Performance
+On the USMLE Step-2 Clinical Skills notes dataset:
+- Precision: 0.982
+- Recall: 0.964
+- F1 Score: 0.973
+The model achieves this impressive performance with only 12.5% of the training data used for the full model, demonstrating exceptional data efficiency. It reduces hallucination by 84.9% and missing features by 85.0% compared to vanilla models. This makes it particularly valuable for domains where annotated data may be scarce or expensive to obtain.
+## Limitations
+- The model was evaluated on standardized USMLE Step-2 Clinical Skills notes and may require adaptation for other clinical domains.
+- Some errors stem from knowledge gaps in specific medical terminology or inconsistencies in annotation.
+- Performance on multilingual or non-standardized clinical notes remains untested.
+- While highly effective, it still performs slightly below the full-data model (F1 score 0.973 vs. 0.981).
+## Ethical Considerations
+Automated assessment systems must ensure fairness across different student populations. While the calibration mechanism enhances interpretability, systematic bias testing is recommended before deployment in high-stakes assessment scenarios. When using this model for educational assessment, we recommend:
+1. Implementing a human-in-the-loop validation process
+2. Regular auditing for demographic parity
+3. Clear communication to students about the use of AI in assessment
+## How to Use
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load model and tokenizer
+model_name = "Manal0809/Mistral_calibrative_few"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+# Example input
+patient_note = """HPI: 35 yo F with heavy uterine bleeding. Last normal period was 6 month ago.
+LMP was 2 months ago. No clots.
+Changes tampon every few hours, previously 4/day. Menarche at 12.
+Attempted using OCPs for menstrual regulation previously but unsuccessful.
+Two adolescent children (ages unknown) at home.
+Last PAP 6 months ago was normal, never abnormal.
+Gained 10-15 lbs over the past few months, eating out more though.
+Hyperpigmented spots on hands and LT neck that she noticed 1-2 years ago.
+SH: state social worker; no smoking or drug use; beer or two on weekends;
+sexually active with boyfriend of 14 months, uses condoms at first but no longer uses them."""
+features_to_extract = ["35-year", "Female", "heavy-periods", "symptoms-for-6-months",
+                       "Weight-Gain", "Last-menstrual-period-2-months-ago",
+                       "Fatigue", "Unprotected-Sex", "Infertility"]
+# Format input as shown in the paper
+input_text = f"""###instruction: Extract medical features from the patient note.
+###patient_history: {patient_note}
+###features: {features_to_extract}
+### Annotation:"""
+# Generate output
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    inputs["input_ids"],
+    max_new_tokens=512,
+    temperature=0.2,
+    num_return_sequences=1
+)
+result = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(result)
+```
+## Model Card Author
+Manal Abumelha - mabumelha@kku.edu.sa
+## Citation