AniMUL: NatureLM dataset + Qwen3-Omni

jebba · October 2, 2025, 5:58pm

I am working on a project to create a new model using the NatureLM dataset and the Qwen3-Omni model.

When it was released, I tested the ESP NatureLM-audio model, which is based on Llama 3.1. While that latter model is only a bit over a year old, it is “ancient” compared to what is available today. There is Llama 4, plus many new multimodal models that have been trained on audio data.

I decided to try to create a new model, based on the latest/greatest models available today, and selected Qwen3-Omni. It is only a few days old, has very high benchmark scores, is trained on audio media, and has a better license than the Meta models. It is also much larger (30B vs 8B) than the Llama 3.1 model that was used for the “original” ESP model.

I created and tested a LoRA, which worked, and then did a full model using 1% of the 17 TB compressed NatureLM dataset. I am doing tweaks on what I learned so far, such as the batch sizes, parameters, etc. Creating a full model using the full dataset will take 2-3 weeks, I estimate.

I have a “source” code repo available, but warning, it is kind of messy at the moment…

Any and all suggestions welcome, especially if you see I’m going down the wrong path!

I should also note, I am a Linux/Unix sysadmin for 30 years and know a bit about AI, but little about actual interspecies communication…

Happy hacking,

-Jeff

diane · October 10, 2025, 9:22pm

Hi @jebba thanks so much for sharing, it’s really cool to see more people using NatureLM (and the dataset) and tinkering with it Have you found any interesting use cases for AniMUL so far? Keep us posted on the progress and also let us know if there’s any specific questions you might have.

- Diane

Research Advocate @ Earth Species Project

Topic		Replies	Views
📍 NatureLM-audio: Getting Started NatureLM-audio	0	40	August 25, 2025
Feedback for NatureLM-audio UI: Hugging Face Spaces Demo NatureLM-audio feedback	2	126	September 23, 2025
How to build a dog behavior app with NatureLM-audio NatureLM-audio	2	67	March 19, 2026
Evaluating NatureLM-audio Over Windows NatureLM-audio	1	37	March 17, 2026
Welcome to alp-data technical forum alp-data	0	44	July 19, 2026

AniMUL: NatureLM dataset + Qwen3-Omni

Related topics