Improve model card: Add metadata, tags, and sample usage

This PR significantly improves the model card for InstructBioMol by:
- Adding `pipeline_tag: any-to-any` to accurately reflect the model's multimodal capabilities and the paper's description.
- Specifying `library_name: transformers`, as confirmed by the model's configuration files.
- Including relevant tags such as `biomolecules`, `proteins`, `molecules`, `multimodal`, `language-model`, `instruction-tuned`, and `llama` to enhance discoverability on the Hugging Face Hub.
- Adding a comprehensive "Quickstart" section with Python code snippets, demonstrating how to load and use the model with both protein and molecule inputs, showcasing its "any-to-any" functionality.

These updates will make the InstructBioMol model more accessible and user-friendly for the community.

Files changed (1) hide show

README.md +44 -0

README.md CHANGED Viewed

@@ -1,5 +1,15 @@
 ---
 license: mit
 ---
 <div align="center">
@@ -45,6 +55,40 @@ InstructBioMol is a multimodal large language model that bridges natural languag
 **Training Objective**: Instruction tuning
 ### Citation

 ---
 license: mit
+pipeline_tag: any-to-any
+library_name: transformers
+tags:
+- biomolecules
+- proteins
+- molecules
+- multimodal
+- language-model
+- instruction-tuned
+- llama
 ---
 <div align="center">
 **Training Objective**: Instruction tuning
+### Quickstart
+You can use InstructBioMol with the `transformers` library by setting `trust_remote_code=True`. The model handles multimodal inputs, specifically proteins and molecules, as demonstrated below.
+```python
+from transformers import AutoModel, AutoTokenizer
+import torch
+# Load the model and tokenizer
+model_name = "hicai-zju/InstructBioMol-instruct-stage1" # or "hicai-zju/InstructBioMol-instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModel.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)
+# Example: Generate a description for a protein sequence
+protein_sequence = ">sp|P0A7G8|FMT_ECOLI Formyltetrahydrofolate synthetase OS=Escherichia coli (strain K12) PE=1 SV=1
+MSKKLVSGTDVAEYLLSVQKEELGDLTLEIDELKTVTLTRIAQLKDFGSGSIPVEAVKLINQENILFLLGTLGIGKTTTTLLKRIISDKDFGFYSSADKLYDYKGYVVFGESVAGAEADWTSKIDVVVAPFTSIDETAKLLAKLTPDVSVLGQAVAVKGALRILGMDDAAQRVADIVGLAVTGQIVKLAANAGADLLEALKLPEVVVVGNGVAYALDGRLKAEFSLDTAVADGASEVAGKLIARNGADGSLKGVLLEELGAAKLKVIAPLTGLAKELKAFESLLAEKKD"
+prompt = f"Please describe this protein:
+<PROT>{protein_sequence}</PROT>"
+input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
+output_ids = model.generate(input_ids, max_new_tokens=100, do_sample=False)
+generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
+print(generated_text)
+# Example: Generate a SMILES string for a molecule description
+mol_description = "A molecule with anti-cancer activity and a molecular weight around 300."
+prompt = f"Generate a SMILES string for a molecule with the following properties:
+<MOL>{mol_description}</MOL>"
+input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
+output_ids = model.generate(input_ids, max_new_tokens=100, do_sample=False)
+generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
+print(generated_text)
+```
 ### Citation