Model Card for dataeaze/RegLLM-v2-ChecklistPointsExtractor-Llama-3.2-3B

This model extracts audit checklist points from regulatory documents.

Model Details

Model Description

Audit and compliance teams in most enterprises need to read and comprehend a huge number of regulatory documents. To ensure compliance the usual procedure is to create audit checklists from the applicable regulations. This model has been fine tuned to automate the extraction of audit checklist points. This small language model powers the complieaze.ai - agentic Gen AI application for regulatory compliance.

  • Developed by: dataeaze systems pvt ltd
  • Model type: LlamaForCausalLM
  • Language(s) (NLP): English
  • License: llama3.2
  • Finetuned from model: meta-llama/Llama-3.2-3B-Instruct

Uses

Direct Use

The model can be used to extract audit checklist points from regulatory documents like acts, rules, regulations, circulars, master circulars, master directions.

Downstream Use

The model is a part of the regulatory compliance application complieaze.ai developed by dataeaze systems.

Out-of-Scope Use

The model is not intended to be used for information security compliance related tasks.

Bias, Risks, and Limitations

  • The model might not work well for tasks other than extracting audit checklists.
  • The model has been fine tuned and tested on documents from Indian regulators like RBI, SEBI etc. There could be a bias towards the structure of Indian regulatory documents.

Recommendations

For most users we recommend using the model through complieaze.ai application. Only the AI experts should attempt to use the model directly.

How to Get Started with the Model

Use the code below to get started with the model.


import json
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline

model = AutoModelForCausalLM.from_pretrained("dataeaze/RegLLM-v2-ChecklistPointsExtractor-Llama-3.2-3B", device_map = "auto", trust_remote_code = True)
tokenizer = AutoTokenizer.from_pretrained("dataeaze/RegLLM-v2-ChecklistPointsExtractor-Llama-3.2-3B")
system_prompt = "You are an expert at creating an auditor's checklist from a given text of a regulatory compliance document"
user_prompt = """
### TEXT:
{document_text}"""

document_text = (
    "(1)The CICs shall share with the CIs, the logic and " 
    "validation processes involved in data acceptance, so that instances of data" 
    "rejection are minimised. The reasons for rejection shall be parameterised by" 
    "the CICs and circulated among the concerned CIs." 
    "\n" 
    "(2)Rejection reports issued by CICs shall be simple and understandable so that" 
    "they can be used for fixing reporting and data level issues by the CIs." 
    "\n" 
    "(3)CIs shall rectify the rejected data and upload the same with the CICs within" 
    "seven days of receipt of such rejection report." 
    "\n" 
    "(4)CICs shall undertake periodic exercises/checks, at least once in a quarter," 
    "to identify identifier inconsistencies in its database and share the findings" 
    "of such identifier inconsistencies with the respective CIs for confirming the" 
    "accuracy of the same. The list of CIs who do not respond in timely manner (say within a month)" 
    "to such data cleansing exercises shall be sent to Department of Supervision," 
    "Central Office at half yearly intervals (as on March 31 and September 30) for" 
    "information."
)

user_prompt = user_prompt.format(document_text = document_text)
messages = [
    {"role" : "system", "content" : system_prompt},
    {"role" : "user", "content": user_prompt}
    ]


pipe = pipeline("text-generation", model = model, tokenizer = tokenizer, temperature = 0.01, max_new_tokens = 4096)
out = pipe(messages)[0]['generated_text'][-1]
out_dic = json.loads(out['content'])
checklist = out_dic['checklist']
for i in checklist:
    print(i['action_item'])
    print("------------")

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation dataset consists of 45 sections out of which 38 have checklist points and 7 don't. There are a total of 212 audit checklist points (and 5 empty audit cheklist points). The sections are extracted and randomly sampled from dataset of regulators RBI, SEBI, IRDAI, NPCI, NFRA, IBBI, FSDC.

Factors

Our gold answer is from DeepSeek-V3 model. We use LLM-as-a-Judge to compare predicted responses with the gold answers DeepSeek-v3. LLM-as-a-Judge is based on GPT-4o. The LLM-as-a-Judge evaluations closely matched the human judgements from domain experts on a dataset of 74 sections and 610 checklist points.

Metrics

We calculate precision, recall and f1 score

To calculate these we compare two answers from two LLMs one of which is considered gold standard (DeepSeek-v3 in our case)

  • true_positives: checklist point in gold answer also captured in generated
  • false_positives: gold checklist for a section is empty, but we have generated checklist items
  • false_negatives: checklist items in gold not captured in generated
  • true_negatives: empty gold checklist and generated checklist also is empty

These comparisons are based on human judgements and imitated by the LLM-as-a-Judge

Results

Metric RegLLM GPT-4o 2024-05-13
True Positives (TP) 187 191
False Positives (FP) 0 0
False Negatives (FN) 25 21
True Negatives (TN) 5 5
Precision 1.0 1.0
Recall 0.882 0.9
F1 Score 0.937 0.947
Throughput (tokens/sec) 223 1 90.2 2
Cost - Input ($ per 1M tokens) 0.534 3 5 4
Cost - Output ($ per 1M tokens) 0.534 3 15 4

References

1. Nvidia L40S GPU on vast.ai

2. https://artificialanalysis.ai/models/gpt-4o-2024-05-13

3. Estimated based upon L40S pricing of $0.86 per hour of L40S GPU on vast.ai

4. https://platform.openai.com/docs/pricing#other-models

Summary

RegLLM offers a cost effective model for extraction of audit checklist points. It is 18.73x cheaper, 2.47x faster than GPT-4o with a 1 per cent impact on the quality of results.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: Nvidia L40S
  • Hours used: 10
  • Cloud Provider: vast.ai
  • Compute Region: Quebec, Canada
  • Carbon Emitted: 0.21 kg CO2 eq.

Model Card Authors

Model Card Contact

Saurabh Daptardar [email protected]

Downloads last month
11
Safetensors
Model size
3.21B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for dataeaze/RegLLM-v2-ChecklistPointsExtractor-Llama-3.2-3B

Finetuned
(260)
this model

Collection including dataeaze/RegLLM-v2-ChecklistPointsExtractor-Llama-3.2-3B