savasy
/

bert-turkish-uncased-qnli

Text Classification

Inference Endpoints

Model card Files Files and versions Community

bert-turkish-uncased-qnli / README.md

system's picture

system HF staff

Update README.md

0a064fd almost 5 years ago

|

history blame contribute delete

2.31 kB


	# Turkish QNLI Model

	I fine-tuned Turkish-Bert-Model for Question-Answering problem with Turkish version of SQuAD; TQuAD
	https://huggingface.co/dbmdz/bert-base-turkish-uncased

	# Data: TQuAD
	I used following TQuAD data set

	https://github.com/TQuad/turkish-nlp-qa-dataset

	I convert the dataset into transformers glue data format of QNLI by the following script
	SQuAD -> QNLI

	```
	import argparse
	import collections
	import json
	import numpy as np
	import os
	import re
	import string
	import sys

	ff="dev-v0.1.json"
	ff="train-v0.1.json"
	dataset=json.load(open(ff))

	i=0
	for article in dataset['data']:
	title= article['title']
	for p in article['paragraphs']:
	context= p['context']
	for qa in p['qas']:
	answer= qa['answers'][0]['text']
	all_other_answers= list(set([e['answers'][0]['text'] for e in p['qas']]))
	all_other_answers.remove(answer)
	i=i+1
	print(i,qa['question'].replace(";",":") , answer.replace(";",":"),"entailment", sep="\t")
	for other in all_other_answers:
	i=i+1
	print(i,qa['question'].replace(";",":") , other.replace(";",":"),"not_entailment" ,sep="\t")

	```


	Under QNLI folder there are dev and test test
	Training data looks like
	> 613 II.Friedrich’in bilginler arasındaki en önemli şahsiyet olarak belirttiği kişi kimdir? filozof, kimyacı, astrolog ve çevirmen not_entailment
	> 614 II.Friedrich’in bilginler arasındaki en önemli şahsiyet olarak belirttiği kişi kimdir? kişisel eğilimi ve özel temaslar nedeniyle not_entailment
	> 615 Michael Scotus’un mesleği nedir? filozof, kimyacı, astrolog ve çevirmen entailment
	> 616 Michael Scotus’un mesleği nedir? Palermo’ya not_entailment





	# Training

	Training the model with following environment
	```
	export GLUE_DIR=./glue/glue_dataTR/QNLI
	export TASK_NAME=QNLI
	```

	```
	python3 run_glue.py \
	--model_type bert \
	--model_name_or_path dbmdz/bert-base-turkish-uncased\
	--task_name $TASK_NAME \
	--do_train \
	--do_eval \
	--data_dir $GLUE_DIR \
	--max_seq_length 128 \
	--per_gpu_train_batch_size 32 \
	--learning_rate 2e-5 \
	--num_train_epochs 3.0 \
	--output_dir /tmp/$TASK_NAME/

	```


	# Evaluation Results

	==
	\| acc \| 0.9124060613527165
	\| loss\| 0.21582801340189717
	==

	> See all my model
	> https://huggingface.co/savasy