Easy Speech-to-Text with Python in 2024

In this article we want to discuses about Easy Speech-to-Text with Python in 2024, Python is one of the powerful language, also it has a lot of libraries for developing speech recognition systems. For example we have SpeechRecognition, PyAudio and Google Cloud Speech-to-Text API, developers can easily integrate speech recognition capabilities into their projects with minimal effort.

Easy Speech-to-Text with Python in 2024

SpeechRecognition library serves as the cornerstone for implementing speech-to-text functionality in Python. It have nice interface and support for multiple APIs, also it is one of the best and popular library in Python Speech Recognition.

How to install SpeechRecognition

You can install SpeechRecognition via pip. Open your terminal or command prompt and run the following command:

pip install SpeechRecognition

1	pip install SpeechRecognition

This is the code for Python Speech Recognition

import speech_recognition as sr

# Initialize the recognizer
recognizer = sr.Recognizer()

# Record audio from the microphone
with sr.Microphone() as source:
    print("Speak something...")
    audio = recognizer.listen(source)

# Recognize speech
try:
    text = recognizer.recognize_google(audio)
    print("You said:", text)
except sr.UnknownValueError:
    print("Sorry, could not understand audio.")
except sr.RequestError as e:
    print("Could not request results; {0}".format(e))

import speech_recognition as sr

# Initialize the recognizer

recognizer = sr.Recognizer()

# Record audio from the microphone

with sr.Microphone() as source:

print("Speak something...")

audio = recognizer.listen(source)

# Recognize speech

try:

text = recognizer.recognize_google(audio)

print("You said:", text)

except sr.UnknownValueError:

print("Sorry, could not understand audio.")

except sr.RequestError as e:

print("Could not request results; {0}".format(e))

In this example, we have captured audio from the microphone, pass it to the Google Web Speech API using recognize_google, and print the recognized text.

Run the code and this will be the result

PyAudio: Capturing Audio Input with Python

PyAudio provides an easy way for capturing audio from microphones and speakers, and it i useful for building speech recognition systems that interact with the real world.

How to Install PyAudio

PyAudio also can be installed using pip. However, PyAudio requires PortAudio to be installed on your system.

pip install pyaudio

1	pip install pyaudio

This is the code for PyAudio

import pyaudio
import wave

# Constants
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

# Initialize PyAudio
audio = pyaudio.PyAudio()

# Open audio stream
stream = audio.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)

print("Recording...")

frames = []

# Capture audio data
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("Recording finished.")

# Stop and close the stream
stream.stop_stream()
stream.close()
audio.terminate()

# Save the recorded audio to a file
with wave.open(WAVE_OUTPUT_FILENAME, 'wb') as wf:
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(audio.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))

import pyaudio

import wave

# Constants

FORMAT = pyaudio.paInt16

CHANNELS = 1

RATE = 44100

CHUNK = 1024

RECORD_SECONDS = 5

WAVE_OUTPUT_FILENAME = "output.wav"

# Initialize PyAudio

audio = pyaudio.PyAudio()

# Open audio stream

stream = audio.open(format=FORMAT,

channels=CHANNELS,

rate=RATE,

input=True,

frames_per_buffer=CHUNK)

print("Recording...")

frames = []

# Capture audio data

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):

data = stream.read(CHUNK)

frames.append(data)

print("Recording finished.")

# Stop and close the stream

stream.stop_stream()

stream.close()

audio.terminate()

# Save the recorded audio to a file

with wave.open(WAVE_OUTPUT_FILENAME, 'wb') as wf:

wf.setnchannels(CHANNELS)

wf.setsampwidth(audio.get_sample_size(FORMAT))

wf.setframerate(RATE)

wf.writeframes(b''.join(frames))

This code records audio from the microphone for a specified duration and saves it to a WAV file.

This will be the result, you need to say something for recording

Google Cloud Speech-to-Text with Python

If you choose to use Google Cloud Speech-to-Text API, you need to install the Google Cloud client library for Python. Run the following command:

pip install google-cloud-speech

1	pip install google-cloud-speech

Also, you need to set up Google Cloud credentials

How to Set up Google Cloud credentials:

First of all you need to go to the Google Cloud Console and create a new service account. Assign the Project Editor or Owner role to this service account. after that you need to download service account key file in JSON format and add that to your working directory.

Make sure to add your Json file in your working directory, also enable Text-to-Speech API from Google Console.

This is our code

from google.cloud import speech_v1p1beta1 as speech

import os

# Set the environment variable to point to your service account key file
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'key.json'

WAVE_OUTPUT_FILENAME = "output.wav"
# Define the sample rate for the audio file
RATE = 44100

# Initialize Google Cloud client
client = speech.SpeechClient()

# Read audio file
with open(WAVE_OUTPUT_FILENAME, "rb") as audio_file:
    content = audio_file.read()

# Configure audio settings
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=RATE,
    language_code="en-US",
)

# Perform speech recognition
response = client.recognize(config=config, audio=audio)

# Print recognized text
for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

from google.cloud import speech_v1p1beta1 as speech

import os

# Set the environment variable to point to your service account key file

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'key.json'

WAVE_OUTPUT_FILENAME = "output.wav"

# Define the sample rate for the audio file

RATE = 44100

# Initialize Google Cloud client

client = speech.SpeechClient()

# Read audio file

with open(WAVE_OUTPUT_FILENAME, "rb") as audio_file:

content = audio_file.read()

# Configure audio settings

audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(

encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,

sample_rate_hertz=RATE,

language_code="en-US",

)

# Perform speech recognition

response = client.recognize(config=config, audio=audio)

# Print recognized text

for result in response.results:

print("Transcript: {}".format(result.alternatives[0].transcript))

This code demonstrates a simple implementation of Google Cloud Speech-to-Text API for transcribing audio files into text. It can be extended and customized for different use cases by modifying the audio settings, handling multiple audio files or incorporating additional features provided by the Speech-to-Text API.

FAQs:

Q: How to do voice to text in Python?

A: Voice-to-text conversion in Python can be achieved using different libraries such as SpeechRecognition. With SpeechRecognition, you can capture audio from a microphone, pass it through a speech recognition engine and convert it into text.

Q: What is the best Speech-to-Text Python?

A: The best Speech-to-Text library in Python often depends on your specific requirements and preferences. However, some popular choices are SpeechRecognition, Google Cloud Speech-to-Text API and Mozilla DeepSpeech. These libraries offers different features and capabilities.

Q: How to do text to speech using Python?

A: Text-to-speech conversion in Python is done using libraries like pyttsx3 or gTTS (Google Text-to-Speech). These libraries allows you to convert text into spoken audio.

How do I transcribe audio to text in Python for free?

You can transcribe audio to text in Python for free using libraries such as SpeechRecognition. SpeechRecognition supports multiple free APIs, like Google Web Speech API, CMU Sphinx, and Wit.ai.

Learn More

Subscribe and Get Free Video Courses & Articles in your Email