--- license: apache-2.0 datasets: - ICTNLP/StreamUni base_model: - microsoft/Phi-4-multimodal-instruct pipeline_tag: audio-text-to-text library_name: adapter-transformers --- # The model for the paper '[StreamUni: Achieving Streaming Speech Translation with a Unified Large Speech-Language Model](https://arxiv.org/abs/2507.07803v1)' ## Usage Please refer to [Github Page](https://github.com/ictnlp/StreamUni) ### Requirements Phi-4 family has been integrated in the `4.48.2` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`. We suggest to run with Python 3.10. Examples of required packages: ``` flash_attn==2.7.4.post1 torch==2.6.0 transformers==4.48.2 accelerate==1.3.0 soundfile==0.13.1 pillow==11.1.0 scipy==1.15.2 torchvision==0.21.0 backoff==2.2.1 peft==0.13.2 ``` ## Training Datasets - https://huggingface.co/datasets/ICTNLP/StreamUni ## Github Pages - https://github.com/ictnlp/StreamUni