In this article we want to discuses about Easy Speech-to-Text with Python in 2024, Python is one of the powerful language, also it has a lot of libraries for developing speech recognition systems. For example we have SpeechRecognition, PyAudio and Google Cloud Speech-to-Text API, developers can easily integrate speech recognition capabilities into their projects with minimal effort.
Easy Speech-to-Text with Python in 2024
SpeechRecognition library serves as the cornerstone for implementing speech-to-text functionality in Python. It have nice interface and support for multiple APIs, also it is one of the best and popular library in Python Speech Recognition.
How to install SpeechRecognition
You can install SpeechRecognition via pip. Open your terminal or command prompt and run the following command:
1 |
pip install SpeechRecognition |
This is the code for Python Speech Recognition
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
import speech_recognition as sr # Initialize the recognizer recognizer = sr.Recognizer() # Record audio from the microphone with sr.Microphone() as source: print("Speak something...") audio = recognizer.listen(source) # Recognize speech try: text = recognizer.recognize_google(audio) print("You said:", text) except sr.UnknownValueError: print("Sorry, could not understand audio.") except sr.RequestError as e: print("Could not request results; {0}".format(e)) |
In this example, we have captured audio from the microphone, pass it to the Google Web Speech API using recognize_google, and print the recognized text.
Run the code and this will be the result
PyAudio: Capturing Audio Input with Python
PyAudio provides an easy way for capturing audio from microphones and speakers, and it i useful for building speech recognition systems that interact with the real world.
How to Install PyAudio
PyAudio also can be installed using pip. However, PyAudio requires PortAudio to be installed on your system.
1 |
pip install pyaudio |
This is the code for PyAudio
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
import pyaudio import wave # Constants FORMAT = pyaudio.paInt16 CHANNELS = 1 RATE = 44100 CHUNK = 1024 RECORD_SECONDS = 5 WAVE_OUTPUT_FILENAME = "output.wav" # Initialize PyAudio audio = pyaudio.PyAudio() # Open audio stream stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print("Recording...") frames = [] # Capture audio data for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print("Recording finished.") # Stop and close the stream stream.stop_stream() stream.close() audio.terminate() # Save the recorded audio to a file with wave.open(WAVE_OUTPUT_FILENAME, 'wb') as wf: wf.setnchannels(CHANNELS) wf.setsampwidth(audio.get_sample_size(FORMAT)) wf.setframerate(RATE) wf.writeframes(b''.join(frames)) |
This code records audio from the microphone for a specified duration and saves it to a WAV file.
This will be the result, you need to say something for recording
Google Cloud Speech-to-Text with Python
If you choose to use Google Cloud Speech-to-Text API, you need to install the Google Cloud client library for Python. Run the following command:
1 |
pip install google-cloud-speech |
Also, you need to set up Google Cloud credentials
How to Set up Google Cloud credentials:
First of all you need to go to the Google Cloud Console and create a new service account. Assign the Project Editor or Owner role to this service account. after that you need to download service account key file in JSON format and add that to your working directory.
Make sure to add your Json file in your working directory, also enable Text-to-Speech API from Google Console.
This is our code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
from google.cloud import speech_v1p1beta1 as speech import os # Set the environment variable to point to your service account key file os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'key.json' WAVE_OUTPUT_FILENAME = "output.wav" # Define the sample rate for the audio file RATE = 44100 # Initialize Google Cloud client client = speech.SpeechClient() # Read audio file with open(WAVE_OUTPUT_FILENAME, "rb") as audio_file: content = audio_file.read() # Configure audio settings audio = speech.RecognitionAudio(content=content) config = speech.RecognitionConfig( encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=RATE, language_code="en-US", ) # Perform speech recognition response = client.recognize(config=config, audio=audio) # Print recognized text for result in response.results: print("Transcript: {}".format(result.alternatives[0].transcript)) |
This code demonstrates a simple implementation of Google Cloud Speech-to-Text API for transcribing audio files into text. It can be extended and customized for different use cases by modifying the audio settings, handling multiple audio files or incorporating additional features provided by the Speech-to-Text API.
FAQs:
Q: How to do voice to text in Python?
A: Voice-to-text conversion in Python can be achieved using different libraries such as SpeechRecognition. With SpeechRecognition, you can capture audio from a microphone, pass it through a speech recognition engine and convert it into text.
Q: What is the best Speech-to-Text Python?
A: The best Speech-to-Text library in Python often depends on your specific requirements and preferences. However, some popular choices are SpeechRecognition, Google Cloud Speech-to-Text API and Mozilla DeepSpeech. These libraries offers different features and capabilities.
Q: How to do text to speech using Python?
A: Text-to-speech conversion in Python is done using libraries like pyttsx3 or gTTS (Google Text-to-Speech). These libraries allows you to convert text into spoken audio.
How do I transcribe audio to text in Python for free?
You can transcribe audio to text in Python for free using libraries such as SpeechRecognition. SpeechRecognition supports multiple free APIs, like Google Web Speech API, CMU Sphinx, and Wit.ai.
Learn More
- Python Speech Recognition With Google Speech
- How to Convert Recorded Audio to Text in Python
- How to Open Website with Python Speech Recognition
Subscribe and Get Free Video Courses & Articles in your Email