Improve model card: Add metadata, tags, and sample usage
Browse filesThis PR significantly improves the model card for InstructBioMol by:
- Adding `pipeline_tag: any-to-any` to accurately reflect the model's multimodal capabilities and the paper's description.
- Specifying `library_name: transformers`, as confirmed by the model's configuration files.
- Including relevant tags such as `biomolecules`, `proteins`, `molecules`, `multimodal`, `language-model`, `instruction-tuned`, and `llama` to enhance discoverability on the Hugging Face Hub.
- Adding a comprehensive "Quickstart" section with Python code snippets, demonstrating how to load and use the model with both protein and molecule inputs, showcasing its "any-to-any" functionality.
These updates will make the InstructBioMol model more accessible and user-friendly for the community.
@@ -1,5 +1,15 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
<div align="center">
|
@@ -45,6 +55,40 @@ InstructBioMol is a multimodal large language model that bridges natural languag
|
|
45 |
|
46 |
**Training Objective**: Instruction tuning
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
### Citation
|
50 |
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
pipeline_tag: any-to-any
|
4 |
+
library_name: transformers
|
5 |
+
tags:
|
6 |
+
- biomolecules
|
7 |
+
- proteins
|
8 |
+
- molecules
|
9 |
+
- multimodal
|
10 |
+
- language-model
|
11 |
+
- instruction-tuned
|
12 |
+
- llama
|
13 |
---
|
14 |
|
15 |
<div align="center">
|
|
|
55 |
|
56 |
**Training Objective**: Instruction tuning
|
57 |
|
58 |
+
### Quickstart
|
59 |
+
|
60 |
+
You can use InstructBioMol with the `transformers` library by setting `trust_remote_code=True`. The model handles multimodal inputs, specifically proteins and molecules, as demonstrated below.
|
61 |
+
|
62 |
+
```python
|
63 |
+
from transformers import AutoModel, AutoTokenizer
|
64 |
+
import torch
|
65 |
+
|
66 |
+
# Load the model and tokenizer
|
67 |
+
model_name = "hicai-zju/InstructBioMol-instruct-stage1" # or "hicai-zju/InstructBioMol-instruct"
|
68 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
69 |
+
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)
|
70 |
+
|
71 |
+
# Example: Generate a description for a protein sequence
|
72 |
+
protein_sequence = ">sp|P0A7G8|FMT_ECOLI Formyltetrahydrofolate synthetase OS=Escherichia coli (strain K12) PE=1 SV=1
|
73 |
+
MSKKLVSGTDVAEYLLSVQKEELGDLTLEIDELKTVTLTRIAQLKDFGSGSIPVEAVKLINQENILFLLGTLGIGKTTTTLLKRIISDKDFGFYSSADKLYDYKGYVVFGESVAGAEADWTSKIDVVVAPFTSIDETAKLLAKLTPDVSVLGQAVAVKGALRILGMDDAAQRVADIVGLAVTGQIVKLAANAGADLLEALKLPEVVVVGNGVAYALDGRLKAEFSLDTAVADGASEVAGKLIARNGADGSLKGVLLEELGAAKLKVIAPLTGLAKELKAFESLLAEKKD"
|
74 |
+
prompt = f"Please describe this protein:
|
75 |
+
<PROT>{protein_sequence}</PROT>"
|
76 |
+
|
77 |
+
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
|
78 |
+
output_ids = model.generate(input_ids, max_new_tokens=100, do_sample=False)
|
79 |
+
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
|
80 |
+
print(generated_text)
|
81 |
+
|
82 |
+
# Example: Generate a SMILES string for a molecule description
|
83 |
+
mol_description = "A molecule with anti-cancer activity and a molecular weight around 300."
|
84 |
+
prompt = f"Generate a SMILES string for a molecule with the following properties:
|
85 |
+
<MOL>{mol_description}</MOL>"
|
86 |
+
|
87 |
+
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
|
88 |
+
output_ids = model.generate(input_ids, max_new_tokens=100, do_sample=False)
|
89 |
+
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
|
90 |
+
print(generated_text)
|
91 |
+
```
|
92 |
|
93 |
### Citation
|
94 |
|