This is Qwen2.5-0.5B-Instruct finetuned to perform the compression of chunks of text.
The goal is to keep the information of each chunk in a RAG system more compressed and easier to read.
The usage of this template is strict
Sample inference:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "cnmoro/Qwen2.5-0.5B-Chunk-Compressor"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
prompt = """<|im_start|>system
Você deve compactar textos sem necessidade de legibilidade para humanos, mas mantendo informações essenciais compreensíveis para outro modelo de linguagem.
Regras para compressão:
Remova palavras desnecessárias como artigos, preposições e pronomes quando possível, desde que a compreensão seja preservada.
Preserve informações essenciais (nomes, locais, ações, conceitos-chave).
Reduza expressões complexas mantendo o significado.
Use listas e separadores para organizar as informações de forma eficiente.
Remova redundâncias e detalhes secundários que não impactam a compreensão geral.<|im_end|>
<|im_start|>user
Texto para compressão:
<Input>
Cleaning the toilet is a task that doesn't interest people. Many, however, pray
for technology that can save them from the unpleasant mission. Apparently, those
prayers were answered: a group of Chinese scientists developed the concept of a
self-cleaning toilet and managed to make it a reality. Thanks to 3D printing,
researchers at Huazhong University of Science and Technology have managed to
revolutionize the unpleasant household chore. The self-cleaning toilet, known
as “ARSFT”, an acronym for “abrasion-resistant super slippery toilet flush” — the
technology that allows automatic cleaning — emerged from a complex combination
of plastic and grains of sand that repel water. In plain English, the technology
ensures that no substance sticks to the surface. Therefore, in addition to being
a salvation for many, this can be a more sustainable alternative to conventional
toilets. The website New Scientist interviewed one of the project's scientists,
Yike Li, who created the self-cleaning toilet. According to Li, the Chinese used,
in addition to the combination of plastic and grains of sand, a laser to bring the
particles together, thus creating the 3D printed self-cleaning toilet. After printing,
the researchers used silicon oil to lubricate the surface of the toilet, managing
to penetrate it due to the structure of the model. This generated the toilet's
self-cleaning capacity, with the following materials leaving no marks after
flushing: Milk; Yogurt; Honey; Muddy water; Starch gel mixed with porridge.
Chinese scientists also tested the self-cleaning toilet with synthetic feces,
using a mixture of miso, yeast, peanut oil and water, managing to imitate human
excrement. Although it may be strange that scientists work to create toilet technologies,
several seemingly “unnecessary” innovations can have a major global impact.
The self-cleaning toilet created by Chinese researchers can considerably reduce water waste.
According to Chinese scientists, the self-cleaning toilet can withstand a thousand scraping
cycles thanks to its super slippery capacity. Therefore, the self-cleaning toilet has
a new flushing method that minimizes water consumption – and waste. The Daily Mail
points out that, since its invention in the 18th century, although the toilet has
increased hygiene, a significant amount of water is required due to the adhesion
between the surface of the toilet and human feces and urine. Worldwide, toilet
flushes correspond to 141 billion liters of water daily. Therefore, in addition
to saving a valuable resource for humanity, the self-cleaning toilet also has another
environmental benefit. In places such as public and chemical bathrooms, especially
where there is no connection to the sanitation system, the self-cleaning toilet
appears as an ideal solution.
</Input><|im_end|>
<|im_start|>assistant
Texto comprimido:
<Output>
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=384, temperature=0.5, do_sample=True)
input_length = inputs.input_ids.shape[1]
generated_tokens = outputs[0, input_length:]
generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
# Remove the stop token from the generated text
generated_text = generated_text.split("</Output>")[0]
print(generated_text)
# Output text:
# - Toilet cleaner - China developed self-cleaning toilet technology.
# - 3D printing - recycled material repels water, prevents sticking.
# - Self-cleaning toilet - reduces water use, waste, improves hygiene.
# - Environmental benefits: reduced water usage globally (141 billion liters/day), reduces resource waste.
# - Public/private bathroom solutions - ideal solution for areas lacking sanitation systems.
- Downloads last month
- 23
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.