Hands-On with NatureLM-audio: A No-Code Demo

Last year, we introduced NatureLM-audio — the first large audio-language model tailored specifically for bioacoustics — and earlier this year, we open-sourced it on Hugging Face so anyone could try it out and build with it. Trained on a diverse mix of bioacoustics data, human speech, and music, NatureLM-audio was designed to support researchers, conservationists, and the broader ethology community in understanding and analyzing animal vocalizations.
Already, we’ve seen individual researchers and conservation projects – like FrogID – plug into NatureLM-audio to evaluate it for real-world use.
NatureLM-audio UI: Experimental Beta
Today, we’re excited to bring NatureLM-audio to a wider audience with an early preview of our interactive UI, hosted on Hugging Face Spaces. This no-code interface lets anyone upload audio containing animal vocalizations and ask the model questions in plain English.
Decoding animal communication is too big a challenge for any one organization, and progress depends on collaboration across the scientific community. That’s why we open-source our models and are designing tools to make them easier to use. Our aim is to help researchers manage vast amounts of bioacoustics data, streamline and automate their analyses, and discover new patterns and insights.
In the spirit of openness, we invite you to explore the demo, share use cases, and suggest new ideas on our Discourse forum. Your feedback helps us understand what’s working, what could be improved, and guides the next steps in refining the model into a tool that better supports bioacoustics and ethology research.
And if you'd like to test upcoming features and get the latest version of NatureLM-audio, sign up for our closed beta waitlist here.
How to use our NatureLM-audio UI
Go to our Hugging Face Space.
Upload a short audio file or click on a pre-loaded example.
Upload a short audio file containing an animal sound (e.g., bird song, frog call). Or, click on an example at the bottom of the landing page. The audio file will be pre-loaded and ready to process. You can also open the “Sample Library” tab to explore and download animal sounds to try.
Trim your audio to 10 seconds or less.
To trim audio, click on the scissor icon at the bottom right of the audio panel. The model works best right now with shorter audio clips, and trimming will also help with faster processing time. We’re actively working on support for uploading longer recordings and batch processing.
Choose a task or write your own prompt.
The pre-loaded tasks are examples of bioacoustics tasks that our model supports. Selecting a task from the dropdown menu will auto-fill the prompt into the chat, which you can edit before pressing Send. You can also ask custom questions about the audio or species.
Send your message.
When you’re ready, press Send to process the audio. You can ask follow-up questions about the audio, or even about the species itself. To explore a new file, press the Clear Button to restart. You can also start over by refreshing the Space.
Prompt & Task Examples
Here are a few examples of bioacoustics tasks the model supports (all available to select from the pre-loaded Task list):
Focal Species Identification
- “What is the common name for the focal species in the audio?”
- “What are the scientific names for the focal species in the audio, if any?
Species/Taxa Classification
- “Which of these, if any, are present in the audio recording? Cetaceans, Aves, None”
- “The objective is to classify the sound into one of the following categories: frogs, birds, insects.”
Individual Speaker Count
- “How many birds are in the audio? Choose between 1, 2, 3 or 4.”
- “How many individuals are vocalizing?”
Life Stage Classification
- “What is the life stage of the focal species in the audio?”
- “Classify the life stage of the focal species in the audio: nestling, juvenile, adult.”
Audio Captioning
- “Caption the audio.”
- “Caption the audio, using the common name for any animal species.”
To see even more examples to try, you can visit our Demo page.
Tips
- Ask one question at a time
- When possible, use scientific or taxonomic names in your prompt
- Keep prompts more open-ended and avoid asking Yes/No or very targeted questions
- Instead of asking: "Is there a bottlenose dolphin vocalizing in the audio? Yes or No."
- Try asking: "What focal species, if any, are heard in the audio?"
- Giving the model options to choose works well for broader categories (less so for specific species)
- Instead of asking, "Classify the audio into one of the following species: Bottlenose Dolphin, Orca, Great Gray Owl"
- Try asking: "Classify the audio into one of the following categories: Cetaceans, Aves, or None."
- Keep audio short (10 seconds or less) - you can trim after uploading
- Accepted Audio formats:
.wav
,.mp3
,.aac
,.flac
,.ogg
,.webm
,.midi
,.aiff
,.wma
,.opus
,.amr
- If you are uploading an
.mp4
, please check that it is not an MPEG-4 Movie file
Loading the Model Yourself
If you’d like to dive deeper and run NatureLM-audio directly, here’s how you can get started.
Instantiating the model:
from NatureLM.models import NatureLM
# Download the model from HuggingFace
model = NatureLM.from_pretrained("EarthSpeciesProject/NatureLM-audio")
model = model.eval().to("cuda")
Using the model:
from NatureLM.infer import Pipeline
audio_paths = ["assets/nri-GreenTreeFrogEvergladesNP.mp3"]
queries = ["What is the common name for the focal species in the audio? Answer:"]
pipeline = Pipeline(model=model)
# Run the model over the audio in sliding windows of 10 seconds with a hop length of 10 seconds
results = pipeline(audio_paths, queries, window_length_seconds=10.0, hop_length_seconds=10.0)
print(results)
# ['#0.00s - 10.00s#: Green Treefrog\n']
You can find more on our training setup, datasets, and benchmarks on our Model Card here. For even more details, you can refer to our NatureLM-audio GitHub repository.
Help Us Improve the Model!
Keep in mind that NatureLM-audio is designed specifically for bioacoustic tasks – it is not an all-purpose chatbot. Since this release is an experimental beta, you may encounter mistakes or unexpected responses. If you do, please share these with us on Discourse. Your feedback is an essential way to help us understand NatureLM-audio’s strengths, limitations, and the contexts where it can be most valuable. Your input will help guide how we prioritize improvements.
We’d love for you to:
- Try out the demo and tell us what worked (or didn’t)
- Explore the open-sourced model on Hugging Face and see how you might integrate it directly (for inspiration, check out how FrogID plans to use NatureLM-audio in their workflow)
- Join us on Discourse to be part of the conversation where we’re gathering ideas, feedback, and questions
- Sign up for our closed beta waitlist here, if you’re interested in testing upcoming features like longer audio files and batch processing.