File size: 3,587 Bytes
afef952
 
 
 
 
0be0f6c
 
5581603
 
07c41db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0be0f6c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: apache-2.0
pipeline_tag: tabular-regression
tags:
- biology
datasets:
- Allanatrix/ProtienBank
metrics:
- accuracy
---

# NexaBio: Advanced Protein Structure Prediction Models

**NexaBio** is a sophisticated two-stage model suite designed for high-accuracy protein structure prediction from amino acid sequences. It comprises two complementary models:

- **NexaBio_1**: A Convolutional Neural Network (CNN) and Bidirectional LSTM (BiLSTM) model for secondary structure prediction.
- **NexaBio_2**: A Variational Autoencoder (VAE) and Diffusion-based model for tertiary (3D) structure prediction.

NexaBio is a core component of the [Nexa Scientific Model Suite](https://huggingface.co/spaces/Allanatrix/NexaHub), a collection of machine learning models advancing scientific discovery.

## Model Overview

### NexaBio_1: Secondary Structure Prediction
- **Architecture**: CNN combined with BiLSTM for robust sequence modeling.
- **Input**: Amino acid sequence (one-hot encoded or embedded).
- **Output**: Secondary structure classifications (e.g., Helix, Sheet, Coil).
- **Use Case**: Identification of local structural motifs and protein folding patterns.

### NexaBio_2: Tertiary Structure Prediction
- **Architecture**: VAE integrated with a Diffusion Model for generative 3D modeling.
- **Input**: Amino acid sequence (optionally augmented with secondary structure predictions).
- **Output**: 3D coordinates of protein backbone atoms.
- **Use Case**: Full tertiary structure prediction for structural analysis and design.

## Applications
- **Structural Bioinformatics**: Enabling precise protein structure analysis for research.
- **Drug Discovery**: Supporting protein-ligand interaction studies and therapeutic design.
- **Protein Engineering**: Facilitating the design of novel proteins for industrial and medical applications.
- **Synthetic Biology**: Generating protein structures for biotechnological innovation.
- **Academic Research**: Serving as a tool for educational and exploratory studies.

## Getting Started

### Example Usage
```python
from transformers import AutoModel

# Initialize the secondary structure prediction model
model_sec = AutoModel.from_pretrained("Allanatrix/NexaBio_1")

# Initialize the tertiary structure prediction model
model_ter = AutoModel.from_pretrained("Allanatrix/NexaBio_2")

# Process an amino acid sequence (refer to model documentation for input formatting)
```

For comprehensive instructions, including inference APIs and preprocessing details, consult the individual model cards on Hugging Face.

## Citation and License
If you utilize NexaBio in your research or applications, please cite this repository and include a link to the [Nexa R&D Space](https://huggingface.co/spaces/Allanatrix/NexaR&D).  
The models and associated code are licensed under the **Boost Software License 1.1 (BSL-1.1)**.

## Part of the Nexa Scientific Ecosystem
Discover other components of the Nexa Scientific Stack:
- [Nexa Data Studio](https://huggingface.co/spaces/Allanatrix/NexaDataStudio): Data processing and visualization tools.
- [Nexa R&D](https://huggingface.co/spaces/Allanatrix/NexaR&D): Research-focused model development environment.
- [Nexa Infrastructure](https://huggingface.co/spaces/Allanatrix/NexaInfrastructure): Scalable ML deployment solutions.
- [Nexa Hub](https://huggingface.co/spaces/Allanatrix/NexaHub): Central portal for Nexa resources.

---

*Developed and maintained by [Allan](https://huggingface.co/Allanatrix), an independent machine learning researcher specializing in scientific AI and infrastructure.*