Cordia a6 model card

Resources and Technical Documentation

Cordia on Kaggle

Terms of Use: Terms

Authors: Peter Till

Model Information

Summary description and brief definition of inputs and outputs.

Description

Cordia a6 is AICord's first in-house brewed model. It was trained on 13,118 human-like dialogues to provide a human-like chatting experience for AI characters.

Inputs and outputs

Input:
- Text string, such as a question, a prompt, or a document to be summarized
- Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
- Total input context of 32K tokens
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
- Total output context of 8192 tokens

Usage

Below, there are some code snippets on how to get quickly started with running the model.

Run locally using Transformers library

First, install the Transformers library. Cordia a6 is supported starting from transformers 4.50.0.

$ pip install -U transformers

Then you can run it locally using the pipeline API

from transformers import pipeline
import torch

pipe = pipeline("text-generation", model="aicord/cordia-a6", device="cuda", torch_dtype=torch.bfloat16)

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."},]
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Write a poem about AICord"},]
        },
    ],
]

output = pipe(messages, max_new_tokens=50)

Use AICord's developer API

SOON

Model data

Data used for model training and how the data was processed.

Training Dataset

Cordia a6 was trained on a dataset of text data that includes a wide variety of sources. The 1B model was trained with 2 trillion tokens. Here are the key components:

Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages.
Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions.
Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.
Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks.
Dialogues: The model was trained on 13,118 human-like, multi-turn dialogues using DailyDialog dataset

The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats.

Evaluation

None

aicord
/

cordia-a6

You need to agree to share your contact information to access this model