--- base_model: - Ultralytics/YOLOv8 pipeline_tag: object-detection --- # YOLOv8 Patent Text Region Detection Model ### Model Description **patent_text_regions** is a [YOLOv8](https://docs.ultralytics.com/models/yolov8) model fine-tuned on a custom dataset of page-level images drawn from historical patent specifications published by the British Patent Office. It has been trained to recognize all text regions located within pages of patent specifications as a single class. We take the initialized weights from the official release of the small YOLOv8s model ([yolov8s.pt](https://huggingface.co/Ultralytics/YOLOv8/blob/main/yolov8s.pt)) and fine tune on our custom dataset. ### Usage This model can be used in the same way as any pre-trained [YOLOv8 model](https://huggingface.co/Ultralytics/YOLOv8) by setting the model path to [best.pt](https://huggingface.co/gbpatentdata/yolov8_patent_layouts/blob/main/weights/best.pt). ### Training Data The dataset was created by randomly sampling 420 page images from British patent specifications published between 1850-1899. The data was randomly split 80-10-10 (train-val-test) and then standard preprocessing (images were stretched and auto-oriented to 640 x 640 pixels) and the following data augmentations were applied using [Roboflow](https://roboflow.com/): - Crop: 0% Minimum Zoom, 20% Maximum Zoom - Grayscale: Apply to 15% of images - Saturation: Between -25% and +25% - Blur: Up to 2.5px - Noise: Up to 0.1% of pixels The custom dataset consists of 1,092 labelled images in total, which are made available in this repository. ### Hyperparameters We train the model using [default hyperparameters](https://docs.ultralytics.com/modes/train/#introduction), except from the batch size (128) and the number of epochs (300). ### Evaluation Evals on the test set are reported below: - mAP50: **0.987** - mAP50-95: **0.892** ## Citation If you use our model or custom training/evaluation data in your research, please cite our accompanying paper as follows: ```bibtex @article{bct2025, title = {300 Years of British Patents}, author = {Enrico Berkes and Matthew Lee Chen and Matteo Tranchero}, journal = {arXiv preprint arXiv:2401.12345}, year = {2025}, url = {https://arxiv.org/abs/2401.12345} } ```