Muhammad2003 commited on
Commit
4405dec
·
verified ·
1 Parent(s): e02ca72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -3
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **LegalAI LLM: A Domain-Specific Legal Model**
2
+
3
+ Welcome to **LegalAI LLM**, a lightweight and efficient legal-specific large language model (LLM) designed to transform the legal industry with advanced natural language processing capabilities. Built with 497M parameters, this model offers unparalleled accuracy, transparency, and reliability for legal professionals, educators, and the general public.
4
+
5
+ ---
6
+
7
+ ## **Overview**
8
+ LegalAI LLM is pre-trained on carefully curated, licensed datasets and fine-tuned to perform a wide range of tasks in the legal domain. With its manageable size, it is optimized to run efficiently on consumer hardware while delivering robust performance for complex legal use cases.
9
+
10
+ ---
11
+
12
+ ### How to use
13
+
14
+ #### Transformers
15
+ ```bash
16
+ pip install transformers
17
+ ```
18
+
19
+ ```python
20
+ from transformers import AutoModelForCausalLM, AutoTokenizer
21
+ checkpoint = "Muhammad2003/Llama3-LegalLM"
22
+
23
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
24
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
25
+ # for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
26
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
27
+
28
+ messages = [{"role": "user", "content": "What is the capital of France."}]
29
+ input_text=tokenizer.apply_chat_template(messages, tokenize=False)
30
+ inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
31
+ outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
32
+ print(tokenizer.decode(outputs[0]))
33
+ ```
34
+
35
+ ## **Features**
36
+ - **Legal Document Analysis**: Analyze legal documents for accuracy, completeness, and compliance with regulations.
37
+ - **Legal Document Generation**: Create contracts, agreements, and notices based on user inputs.
38
+ - **Case Law Retrieval**: Search and retrieve case laws with relevant summaries and insights.
39
+ - **Evidence Chain Analysis**: Map relationships between facts and generate evidence chains from case documents.
40
+ - **Legal Query Handling**: Provide accurate, context-aware answers to legal questions.
41
+ - **Bias Mitigation**: Designed to minimize racial, gender, and other biases for fair and equitable results.
42
+ - **Hallucination Reduction**: Enhanced training processes to minimize the generation of fabricated or inaccurate legal content.
43
+
44
+ ---
45
+
46
+ ## **Datasets**
47
+ LegalAI LLM is trained on publicly available, licensed legal datasets:
48
+ 1. **HFforLegal/case-law**: Comprehensive corpus of legal documents under CC BY 4.0.
49
+ 2. **Case Law Access Project (CAP)**: U.S. state and federal case law from 1658–2020.
50
+ 3. **Court Listener**: Federal and state court opinions from the Free Law Project.
51
+ 4. **Open Australian Legal Corpus**: Australian legislative and judicial documents.
52
+ 5. **German Court Decisions (Gesp)**: German court decisions collected for legal research.
53
+ 6. **The Pakistan Codes**: Official laws and constitution of Pakistan.
54
+
55
+ All datasets were collected with full adherence to ownership, intellectual property, and licensing requirements, ensuring complete transparency and no risk of legal repercussions for users.
56
+
57
+ ---
58
+
59
+ ## **Technical Specifications**
60
+ - **Model Size**: 497M parameters
61
+ - **Training Framework**: PyTorch with LangChain integration
62
+ - **Supported Hardware**: Consumer-grade GPUs (e.g., NVIDIA T4, MacBook M1) or cloud platforms
63
+ - **Input Format**: Text queries or legal document inputs
64
+ - **Output Format**: Structured responses, summaries, or generated documents
65
+
66
+ ---
67
+
68
+ ## **Use Cases**
69
+ - **Legal Professionals**: Streamline workflows by generating and analyzing legal documents.
70
+ - **Educators**: Assist legal students with case law studies and research tools.
71
+ - **Public Users**: Enable non-technical users to generate basic legal documents and understand judicial processes.
72
+ - **Law Firms**: Integrate with case management systems for enhanced productivity.
73
+
74
+ ---
75
+
76
+ ## **Ethical AI Commitment**
77
+ LegalAI LLM adheres to strict ethical guidelines:
78
+ - **Transparency**: Training data sources are fully disclosed.
79
+ - **Compliance**: No copyrighted or unlawfully sourced data is used.
80
+ - **Bias Mitigation**: Designed to reduce discriminatory outputs in legal contexts.
81
+
82
+ ---
83
+
84
+
85
+ ## **Limitations**
86
+ - The model may require further fine-tuning for jurisdiction-specific tasks.
87
+ - Certain nuanced legal interpretations may require human oversight.
88
+
89
+ ---
90
+
91
+ ## **Contributors**
92
+ - Muhammad Bin Usman
93
+ - Zain Ul Abideen
94
+ - Syed Hasan Abbas
95
+
96
+ ---
97
+
98
+ ## **License**
99
+ This project is licensed under the **MIT License**. See the [LICENSE](./LICENSE) file for more details.
100
+
101
+ ---
102
+
103
+ ## **Contact**
104
+ For support or inquiries, please reach out at [[email protected]].
105
+
106
+ Explore the future of legal AI with **LegalAI LLM**!