You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Cordia a6 model card

Resources and Technical Documentation

Terms of Use: Terms

Authors: Peter Till

Model Information

Summary description and brief definition of inputs and outputs.

Description

Cordia a6 is AICord's first in-house brewed model. It was trained on 13,118 human-like dialogues to provide a human-like chatting experience for AI characters.

Inputs and outputs

  • Input:

    • Text string, such as a question, a prompt, or a document to be summarized
    • Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
    • Total input context of 32K tokens
  • Output:

    • Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document
    • Total output context of 8192 tokens

Usage

Below, there are some code snippets on how to get quickly started with running the model.

Run locally using Transformers library

First, install the Transformers library. Cordia a6 is supported starting from transformers 4.50.0.

$ pip install -U transformers

Then you can run it locally using the pipeline API

from transformers import pipeline
import torch

pipe = pipeline("text-generation", model="aicord/cordia-a6", device="cuda", torch_dtype=torch.bfloat16)

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."},]
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Write a poem about AICord"},]
        },
    ],
]

output = pipe(messages, max_new_tokens=50)

Use AICord's developer API

SOON

Model data

Data used for model training and how the data was processed.

Training Dataset

Cordia a6 was trained on a dataset of text data that includes a wide variety of sources. The 1B model was trained with 2 trillion tokens. Here are the key components:

  • Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. The training dataset includes content in over 140 languages.
  • Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code and understand code-related questions.
  • Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.
  • Images: A wide range of images enables the model to perform image analysis and visual data extraction tasks.
  • Dialogues: The model was trained on 13,118 human-like, multi-turn dialogues using DailyDialog dataset

The combination of these diverse data sources is crucial for training a powerful multimodal model that can handle a wide variety of different tasks and data formats.

Evaluation

None

Downloads last month
4
Safetensors
Model size
1,000M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aicord/cordia-a6

Quantizations
1 model

Dataset used to train aicord/cordia-a6

Space using aicord/cordia-a6 1