AniMUL: NatureLM dataset + Qwen3-Omni

I am working on a project to create a new model using the NatureLM dataset and the Qwen3-Omni model.

When it was released, I tested the ESP NatureLM-audio model, which is based on Llama 3.1. While that latter model is only a bit over a year old, it is “ancient” compared to what is available today. There is Llama 4, plus many new multimodal models that have been trained on audio data.

I decided to try to create a new model, based on the latest/greatest models available today, and selected Qwen3-Omni. It is only a few days old, has very high benchmark scores, is trained on audio media, and has a better license than the Meta models. It is also much larger (30B vs 8B) than the Llama 3.1 model that was used for the “original” ESP model.

I created and tested a LoRA, which worked, and then did a full model using 1% of the 17 TB compressed NatureLM dataset. I am doing tweaks on what I learned so far, such as the batch sizes, parameters, etc. Creating a full model using the full dataset will take 2-3 weeks, I estimate.

I have a “source” code repo available, but warning, it is kind of messy at the moment…

Any and all suggestions welcome, especially if you see I’m going down the wrong path!

I should also note, I am a Linux/Unix sysadmin for 30 years and know a bit about AI, but little about actual interspecies communication…

Happy hacking,

-Jeff

2 Likes

Hi @jebba thanks so much for sharing, it’s really cool to see more people using NatureLM (and the dataset) and tinkering with it :smiley: Have you found any interesting use cases for AniMUL so far? Keep us posted on the progress and also let us know if there’s any specific questions you might have.

- Diane

Research Advocate @ Earth Species Project