PyTorch
llama
nielsr HF Staff commited on
Commit
e2311ce
·
verified ·
1 Parent(s): 54d4efb

Improve model card: Add metadata, tags, and sample usage

Browse files

This PR significantly improves the model card for InstructBioMol by:
- Adding `pipeline_tag: any-to-any` to accurately reflect the model's multimodal capabilities and the paper's description.
- Specifying `library_name: transformers`, as confirmed by the model's configuration files.
- Including relevant tags such as `biomolecules`, `proteins`, `molecules`, `multimodal`, `language-model`, `instruction-tuned`, and `llama` to enhance discoverability on the Hugging Face Hub.
- Adding a comprehensive "Quickstart" section with Python code snippets, demonstrating how to load and use the model with both protein and molecule inputs, showcasing its "any-to-any" functionality.

These updates will make the InstructBioMol model more accessible and user-friendly for the community.

Files changed (1) hide show
  1. README.md +44 -0
README.md CHANGED
@@ -1,5 +1,15 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
  <div align="center">
@@ -45,6 +55,40 @@ InstructBioMol is a multimodal large language model that bridges natural languag
45
 
46
  **Training Objective**: Instruction tuning
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ### Citation
50
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: any-to-any
4
+ library_name: transformers
5
+ tags:
6
+ - biomolecules
7
+ - proteins
8
+ - molecules
9
+ - multimodal
10
+ - language-model
11
+ - instruction-tuned
12
+ - llama
13
  ---
14
 
15
  <div align="center">
 
55
 
56
  **Training Objective**: Instruction tuning
57
 
58
+ ### Quickstart
59
+
60
+ You can use InstructBioMol with the `transformers` library by setting `trust_remote_code=True`. The model handles multimodal inputs, specifically proteins and molecules, as demonstrated below.
61
+
62
+ ```python
63
+ from transformers import AutoModel, AutoTokenizer
64
+ import torch
65
+
66
+ # Load the model and tokenizer
67
+ model_name = "hicai-zju/InstructBioMol-instruct-stage1" # or "hicai-zju/InstructBioMol-instruct"
68
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
69
+ model = AutoModel.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)
70
+
71
+ # Example: Generate a description for a protein sequence
72
+ protein_sequence = ">sp|P0A7G8|FMT_ECOLI Formyltetrahydrofolate synthetase OS=Escherichia coli (strain K12) PE=1 SV=1
73
+ MSKKLVSGTDVAEYLLSVQKEELGDLTLEIDELKTVTLTRIAQLKDFGSGSIPVEAVKLINQENILFLLGTLGIGKTTTTLLKRIISDKDFGFYSSADKLYDYKGYVVFGESVAGAEADWTSKIDVVVAPFTSIDETAKLLAKLTPDVSVLGQAVAVKGALRILGMDDAAQRVADIVGLAVTGQIVKLAANAGADLLEALKLPEVVVVGNGVAYALDGRLKAEFSLDTAVADGASEVAGKLIARNGADGSLKGVLLEELGAAKLKVIAPLTGLAKELKAFESLLAEKKD"
74
+ prompt = f"Please describe this protein:
75
+ <PROT>{protein_sequence}</PROT>"
76
+
77
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
78
+ output_ids = model.generate(input_ids, max_new_tokens=100, do_sample=False)
79
+ generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
80
+ print(generated_text)
81
+
82
+ # Example: Generate a SMILES string for a molecule description
83
+ mol_description = "A molecule with anti-cancer activity and a molecular weight around 300."
84
+ prompt = f"Generate a SMILES string for a molecule with the following properties:
85
+ <MOL>{mol_description}</MOL>"
86
+
87
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
88
+ output_ids = model.generate(input_ids, max_new_tokens=100, do_sample=False)
89
+ generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
90
+ print(generated_text)
91
+ ```
92
 
93
  ### Citation
94