Muhammad2003
/

Llama3-LegalLM

Safetensors

llama

Model card Files Files and versions

xet

Community

Muhammad2003 commited on Mar 13

Commit

4405dec

verified ·

1 Parent(s): e02ca72

Update README.md

Browse files

Files changed (1) hide show

README.md +106 -3

README.md CHANGED Viewed

@@ -1,3 +1,106 @@
----
-license: apache-2.0
----

+# **LegalAI LLM: A Domain-Specific Legal Model**
+Welcome to **LegalAI LLM**, a lightweight and efficient legal-specific large language model (LLM) designed to transform the legal industry with advanced natural language processing capabilities. Built with 497M parameters, this model offers unparalleled accuracy, transparency, and reliability for legal professionals, educators, and the general public.
+---
+## **Overview**
+LegalAI LLM is pre-trained on carefully curated, licensed datasets and fine-tuned to perform a wide range of tasks in the legal domain. With its manageable size, it is optimized to run efficiently on consumer hardware while delivering robust performance for complex legal use cases.
+---
+### How to use
+#### Transformers
+```bash
+pip install transformers
+```
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+checkpoint = "Muhammad2003/Llama3-LegalLM"
+device = "cuda" # for GPU usage or "cpu" for CPU usage
+tokenizer = AutoTokenizer.from_pretrained(checkpoint)
+# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
+model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
+messages = [{"role": "user", "content": "What is the capital of France."}]
+input_text=tokenizer.apply_chat_template(messages, tokenize=False)
+inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
+outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
+print(tokenizer.decode(outputs[0]))
+```
+## **Features**
+- **Legal Document Analysis**: Analyze legal documents for accuracy, completeness, and compliance with regulations.
+- **Legal Document Generation**: Create contracts, agreements, and notices based on user inputs.
+- **Case Law Retrieval**: Search and retrieve case laws with relevant summaries and insights.
+- **Evidence Chain Analysis**: Map relationships between facts and generate evidence chains from case documents.
+- **Legal Query Handling**: Provide accurate, context-aware answers to legal questions.
+- **Bias Mitigation**: Designed to minimize racial, gender, and other biases for fair and equitable results.
+- **Hallucination Reduction**: Enhanced training processes to minimize the generation of fabricated or inaccurate legal content.
+---
+## **Datasets**
+LegalAI LLM is trained on publicly available, licensed legal datasets:
+1. **HFforLegal/case-law**: Comprehensive corpus of legal documents under CC BY 4.0.
+2. **Case Law Access Project (CAP)**: U.S. state and federal case law from 1658–2020.
+3. **Court Listener**: Federal and state court opinions from the Free Law Project.
+4. **Open Australian Legal Corpus**: Australian legislative and judicial documents.
+5. **German Court Decisions (Gesp)**: German court decisions collected for legal research.
+6. **The Pakistan Codes**: Official laws and constitution of Pakistan.
+All datasets were collected with full adherence to ownership, intellectual property, and licensing requirements, ensuring complete transparency and no risk of legal repercussions for users.
+---
+## **Technical Specifications**
+- **Model Size**: 497M parameters
+- **Training Framework**: PyTorch with LangChain integration
+- **Supported Hardware**: Consumer-grade GPUs (e.g., NVIDIA T4, MacBook M1) or cloud platforms
+- **Input Format**: Text queries or legal document inputs
+- **Output Format**: Structured responses, summaries, or generated documents
+---
+## **Use Cases**
+- **Legal Professionals**: Streamline workflows by generating and analyzing legal documents.
+- **Educators**: Assist legal students with case law studies and research tools.
+- **Public Users**: Enable non-technical users to generate basic legal documents and understand judicial processes.
+- **Law Firms**: Integrate with case management systems for enhanced productivity.
+---
+## **Ethical AI Commitment**
+LegalAI LLM adheres to strict ethical guidelines:
+- **Transparency**: Training data sources are fully disclosed.
+- **Compliance**: No copyrighted or unlawfully sourced data is used.
+- **Bias Mitigation**: Designed to reduce discriminatory outputs in legal contexts.
+---
+## **Limitations**
+- The model may require further fine-tuning for jurisdiction-specific tasks.
+- Certain nuanced legal interpretations may require human oversight.
+---
+## **Contributors**
+- Muhammad Bin Usman
+- Zain Ul Abideen
+- Syed Hasan Abbas
+---
+## **License**
+This project is licensed under the **MIT License**. See the [LICENSE](./LICENSE) file for more details.
+---
+## **Contact**
+For support or inquiries, please reach out at [[email protected]].
+Explore the future of legal AI with **LegalAI LLM**!