ACE-Step

🇺🇸 English
🇨🇳 简体中文
🇯🇵 日本語
🇰🇷 한국어

ACE-Step: Next Generation Music Foundation Model

Experience the perfect balance of speed, coherence, and controllability in AI music generation.

Experience ACE-Step Live Demo

Why Choose ACE-Step?

Lightning-Fast Generation

Synthesize up to 4 minutes of music in just 20 seconds on an A100 GPU — 15x faster than LLM-based models.

🎼

Superior Musical Coherence

Achieve long-range structural consistency in melody, harmony, and rhythm, surpassing traditional diffusion and LLM models.

🎚️

Advanced Controllability

Easily edit lyrics, redraw segments, generate variations, and control musical parameters.

🔗

Multimodal Alignment

Seamlessly align lyrics, vocals, and accompaniment to create richer, more expressive music.

🌐

Open Source & Extensible

Built for the community. Easily fine-tune, extend, or integrate ACE-Step into your creative workflow.

🔒

Privacy & Security

Your creations belong to you. We prioritize privacy and data protection for all users.

How ACE-Step Works

ACE-Step integrates diffusion-based generation, deep compression autoencoder, and linear transformer to achieve unparalleled speed and quality. Semantic alignment through MERT and m-hubert ensures fast convergence and multimodal control.

Bridging the Gap in Music AI

ACE-Step bridges the gap between current methods that face inherent trade-offs between generation speed, musical coherence, and controllability. LLM-based models (like Yue, SongGen) excel at lyrics alignment but suffer from slow inference and structural artifacts, while diffusion models (like DiffRhythm) offer faster synthesis but lack long-distance structural coherence.

Innovative Architecture

Our model combines diffusion-based generation with Sana's deep compression autoencoder (DCAE) and lightweight linear transformers. It further leverages MERT and m-hubert to align semantic representations during training (REPA), achieving rapid convergence.

Foundation Model for Music AI

Rather than building another end-to-end text-to-music pipeline, our vision is to establish a foundation model for music AI: a fast, versatile, efficient, and flexible architecture that enables easy training of subtasks on top of it. This paves the way for powerful tools that can integrate seamlessly into the creative workflows of music artists, producers, and content creators.

Applications

Lyric2Vocal

Transform lyrics into expressive vocals through LoRA fine-tuning.

Text2Sample

Generate music samples and loops from text prompts.

Singing2Accompaniment

Convert singing to accompaniment (coming soon).

RapMachine

AI-powered rap generation (coming soon).

StemGen

Automatic track separation and generation (coming soon).

Ready to Create with ACE-Step?

Join the new wave of AI music creation. Try ACE-Step now and shape the future of music.

Get Started