Dacon-Encrypted-Kobart-Base-v2

Overview

This model is developed to restore obfuscated Korean reviews to their original, human-readable form. The algorithm is designed to reverse the process of encryption or distortion of Korean text, providing a clear and interpretable version of the input text.

Model Purpose

The model is trained to handle cases of text obfuscation, primarily for Korean reviews that are purposely made difficult to read. It aims to restore these reviews back to their readable and understandable format for use in various natural language processing (NLP) tasks.

Description

  • Task: De-obfuscation of Korean text
  • Objective: Convert difficult-to-read, obfuscated Korean reviews back into their original forms.
  • Data: This model is based on a custom dataset from the Dacon competition, containing encrypted Korean reviews.

Intended Use

This model can be applied in scenarios where Korean text has been distorted or encrypted for privacy, security, or other purposes and needs to be restored to its original state for analysis or review. Example use cases include:

  • Restoring reviews from encrypted forms
  • Processing distorted data for sentiment analysis, opinion mining, and other NLP tasks

Model Details

  • Base Model: Kobart-Base
  • Architecture: Encoder-Decoder (BART)
  • Training Data: A dataset from the Dacon Encrypted Korean Reviews competition.

How to Use

Example:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("otter35/Dacon-Encrypted-kobart-base-v2")
tokenizer = AutoTokenizer.from_pretrained("otter35/Dacon-Encrypted-kobart-base-v2")

# Input obfuscated review text
text = "์•ผ... ์นต์ปฅ ์ข‹๊พœ ๋ถ€๋ด"

# Preprocessing and inference
inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True)
outputs = model.generate(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])
decoded_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Decoded Review:", decoded_text)
Downloads last month
3
Safetensors
Model size
124M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.