Cordia a6 model card
Resources and Technical Documentation
Terms of Use: Terms
Authors: Peter Till
Model Information
Summary description and brief definition of inputs and outputs.
Description
Cordia a6 is AICord's first in-house brewed model. It was trained on 13,118 human-like dialogues to provide a human-like chatting experience for AI characters.
Inputs and outputs
Input:
- Text string, such as a question, a prompt, or a document to be summarized
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
- Total input context of 32K tokens
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
- Total output context of 8192 tokens
Usage
Below, there are some code snippets on how to get quickly started with running the model.
Run locally using Transformers library
First, install the Transformers library. Cordia a6 is supported starting from transformers 4.50.0.
$ pip install -U transformers
Then you can run it locally using the pipeline
API
from transformers import pipeline
import torch
pipe = pipeline("text-generation", model="aicord/cordia-a6", device="cuda", torch_dtype=torch.bfloat16)
messages = [
[
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful assistant."},]
},
{
"role": "user",
"content": [{"type": "text", "text": "Write a poem about AICord"},]
},
],
]
output = pipe(messages, max_new_tokens=50)
Use AICord's developer API
SOON
Model data
Data used for model training and how the data was processed.
Training Dataset
Cordia a6 was trained on a dataset of text data that includes a wide variety of sources. The 1B model was trained with 2 trillion tokens. Here are the key components:
- Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages.
- Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions.
- Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.
- Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks.
- Dialogues: The model was trained on 13,118 human-like, multi-turn dialogues using DailyDialog dataset
The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats.
Evaluation
None
- Downloads last month
- 4