--- license: mit language: - en pipeline_tag: object-detection tags: - irs - '1040' - '2023' - tax - form --- # Finetuned RT-DETR model to extract tables from IRS 1040 2023 forms For IRS from 1040 document data parsing, I have previously uploaded a trained Donut model that is based on vision transformers. The donut model can perform single-shot parsing of 1040 forms and return parsed form values in json format. Vision transformers are cutting edge AI models, they still have some limitations when performing OCR related tasks, where they sometimes hallucinate. Secondly, they do not provide confidence level for extracted fields data, this makes it extremely challenging in making downstream decisions on when to accept a particular field value or drop the parsed value. Especially when dealing with financial data, like Form 1040, accuracy and confidence values are of utmost importance. This article provides a working example of using multiple AI models to perform OCR of the form 1040 and extract text values in json format with confidence levels for each field. ```bash ----------------------- | Classification Model | (Model is used to classify IRS Form 1040 by page) ----------------------- | | | ----------------------- | RT-DETR | | Object Detection Model| (Model trained to extract header and tables from Form 1040) ----------------------- | | | ----------------------- | Table Transformer | (Table transformer model along with OCR models | Text OCR | i.e. PaddleOCR or Tesseract to parse field data) ----------------------- ``` ## Classes for form 1040 The RT-DETR model is finetuned with 6 classes related to 1040 2023 form. ### Page 1 classes 1040_pg1_header - represents the header of the page 1 1040_pg1_tax_tbl - represents a table with financial values 1040_pg1_sch_b - represents a table with schedule b financial values ### Page 2 classes 1040_pg2_tax_tbl 1040_pg2_pay_tbl 1040_pg2_signature_frm # Fake Synthetic Data for IRS 1040 2023 Form Page 1 ![image](https://github.com/user-attachments/assets/b56cca04-1db9-497d-bb34-46b423207984) ## Cropped - Class: 1040_pg1_header ![bboxes_pg1_header](https://github.com/user-attachments/assets/c6bf7b76-fc8c-4572-ab31-790d1391adf3) ## Cropped - Class: 1040_pg1_tax_tbl ![bboxes_pg1_tax_table](https://github.com/user-attachments/assets/037b7bf8-0add-410e-b85e-3bbe6fa2f29a) ## Cropped - Class: 1040_pg1_sch_b ![bboxes_pg1_tbl2](https://github.com/user-attachments/assets/47e49711-1b90-46a1-8f0a-4770c01e6d2c) # Fake Synthetic Data for IRS 1040 2023 Form Page 2 ![redlined_pg2](https://github.com/user-attachments/assets/320d422b-4c8f-4134-9d3a-c3d94c72df51) ## Cropped - Class: 1040_pg2_tax_tbl ![bboxes_pg2_tax](https://github.com/user-attachments/assets/07a2200d-5546-4539-82a3-35c1bc6b7658) ## Cropped - Class: 1040_pg2_pay_tbl ![bboxes_pg2_small_tbl](https://github.com/user-attachments/assets/21b914b7-3666-4478-ae67-1b78fac55de3) ## Cropped - Class: 1040_pg2_signature_frm ![bboxes_pg2_signature](https://github.com/user-attachments/assets/ae5df46a-c878-406b-9472-208d49be49c4) ```python from ultralytics import RTDETR import cv2 import supervision as sv # -------------------------- model_file = 'replace with path to model file /1040_2023_v1.pt' # Load a trained model from local path model = RTDETR(model_file) # Display model information (optional) model.info() image_path = 'path to source image' # read src image img = cv2.imread(image_path) # perform inference results = model.predict(img, imgsz=1024) #imgsz is set to 1024 as the model is finetuned with this size # use the supervision library for parsing results and generating redline boxes detections = sv.Detections.from_ultralytics(results[0]) #get a bounding box and label the annotator bounding_box_annotator = sv.BoundingBoxAnnotator() label_annotaotr = sv.LabelAnnotator() # generate labels for images labels = [ f"{class_name} {confidence:.2f}" for class_name, confidence in zip(detections['class_name'], detections.confidence) ] # annotate images with labeled bounding boxes annotated_image = bounding_box_annotator.annotate( scene=img.copy(), detections=detections ) annotated_image = label_annotaotr.annotate(annotated_image, detections=detections, labels=labels) # dummy counter for generated image names count = 0 # write the annotated image cv2.imwrite('redlined_' + str(count) + '.png', annotated_image) # crop bounding boxes and save for xyxy in detections.xyxy: cropped_image = sv.crop_image(image=img, xyxy=xyxy) count = count + 1 cv2.imwrite('bboxes_' + str(count) + '.png', cropped_image) ```