Libraries: cIntSpeech - page 2

 
Gan Zhi Zhong #:

Really amazing!

You just need to change the "file path" in the code, due to OS and MT5 update, and then you can implement TTS on Windows 11 with the latest version of MT5.

In 5 years, the topic is completely outdated. Nowadays you should not use dumb Microsoft TTS, but modern AI models with natural pronunciation. True, it requires RAM or GPU. Is it necessary?
 
Edgar Akhmadeev #:
In 5 years, the topic is completely outdated. Nowadays you should not use dumb Microsoft TTS, but modern AI models with natural pronunciation. True, it requires RAM or GPU. Is it necessary?

Integrating TTS into my MT5 EA is just a basic requirement.

I'm self-taught in programming,

so I'm also very interested in the advantages of AI models you mentioned, but I don't know where to start. Could you provide some practical cases?

Greatly appreciate it.

 
Gan Zhi Zhong #:

Integrating TTS into my MT5 EA is just a basic requirement.

I am self-taught in programming,

so I am also very interested in the benefits of the AI models you mentioned, but I don't know where to start. Could you please give some practical examples?

It would be very much appreciated.

I have not installed TTS models, only LLM, I know about their quality from articles and reviews. Russian-language sites have very useful information, but will not work for you. And English-language ones I do not know. But you can find a lot of things on YouTube.

Besides, I do not know your hardware context - where you can run models - CPU+RAM, GPU NVidia or AMD, how much VRAM. A lot depends on that.

Also, if the project is commercial, you can use paid access to online voice models (Text2Speech, Speech2Text). There are a lot of them.

Look for voice models on huggingface, with sizes depending on your hardware. For LLM text generation, the most popular quantisation is GGUF 4_K_M. Balance between quality and size.

Which of the local platforms support voice models, I can't tell you. I use only for text - llama.cpp, ollama, they support models in GGUF format (with weight quantisation), which saves a lot of memory.

Maybe choose ONNX format, it is directly supported in MT5, but only on CPU, so it is slow and needs a lot of memory.

GitHub - ggml-org/llama.cpp: LLM inference in C/C++
GitHub - ggml-org/llama.cpp: LLM inference in C/C++
  • ggml-org
  • github.com
LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
 

Now found out that the 3 best local AI synthesisers are in Python: coqui TTS, Chatterbox TTS and Piper TTS.

Haven't tried it. I'm not friendly with Python at all, so I always failed to resolve any Python libraries when installing "pip install ...".

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
  • coqui-ai
  • github.com
📣 ⓍTTSv2 is here with 16 languages and better performance across the board. 📣 ⓍTTS fine-tuning code is out. Check the example recipes. 📣 ⓍTTS can now stream with 📣 ⓍTTS, our production TTS model that can speak 13 languages, is released Blog Post, Demo, Docs 📣 🐶Bark is now available for inference with unconstrained voice cloning. Docs 📣...