Voice Print AI

Voice Print AI is a open source project (github link) that uses Whisperfile to generate transcripts from audio files completely locally on your system. You can read more about whisperfile here.

Introduction

The reason for creating this project was to have a simple way to generate transcripts from audio files without having to upload them to any cloud service. Focus is on local, simplicity and privacy. There are ample solutions available that can do the job but first they require you to upload the audio file to their servers and secondly you will have to pay for the service. After having worked in finance and healthcare industry, I understand the importance of privacy and security of the data. This project is an attempt to solve that problem.

How it works

  1. You download model of your choice from hugging face. The models range from tiny(315MB) to large(3.33GB). Higher the model size, better the performance but also more time it will take to process the audio file. Also more resources will be required. I will recommend starting with tiny model and then moving to larger models if required.
  2. Once downloaded you run the model it also includes a http server that will serve the model. We will be using this server to send the audio file and get the transcript back.

How to use

  1. Upload the file through the web interface.
  2. The file will be sent to the whisperfile server and the transcript will be generated.
  3. The transcript will be displayed on the screen.
  4. Transcripts are stored locally on your system along with the audio file.
  5. You can go to the history tab to see all the transcripts generated. It also includes the audio player to play the audio file, along with the transcript.

You can find more detailed instructions on how to use the project on the github page.

Interface

Voice Print AI

History

Voice Print AI

Future

I will be adding more features to the project like:

  • Dekstop app for non technical users.
  • Running multiple models at the same time.
  • Complete CRUD operations on the transcripts.

Hope you like the project and find it useful. If you have any suggestions or feedback, feel free to reach out to me on twitter or linkedin.