Module MLLPStreamingClient.MLLPStreamingClient
The MLLP-TTP gRPC Streaming API Python3 client library (v1.1.0)
The MLLP-TTP gRPC Streaming API Python3 client library can be used to develop your own streaming speech recognition application/backend, or also as a standalone command-line tool to transcribe audio files. It is based on the gRPC framework.
Installation (via a provided .whl file):
pip install MLLPStreamingClient_mllp-${VERSION}-py3-none-any.whl
Below are shown some examples on interacting with the gRPC Streaming API with this library.
First, we have to import the MLLPStreamingClient
library and create a MLLPStreamingClient
class instance:
from MLLPStreamingClient import MLLPStreamingClient
cli = MLLPStreamingClient(server_hostname, server_port, api_user,
api_secret, server_ssl_cert_file)
server_hostname, server_port, api_user, api_secret and server_ssl_cert_file are given in the API section of TTP.
Next, and optionally, we can perform a explicit call to the rpc GetAuthToken method, to get a valid auth token for the nextcoming rpc calls:
cli.getAuthToken()
Please note that if we do not perform explicitly this call, it will be performed implicitly by the library, if necessary.
Then, we check out the available transcription (ASR) systems offered by the service by calling to the GetTranscribeSystemsInfo rpc call:
systems = cli.getTranscribeSystemsInfo()
import json
print(json.dumps(systems, indent=4))
Finally, we pick up one system (system_id
) and start transcribing our live
audio stream supplied as an iterator method called i.e. myStreamIterator()
, while
printing only consolidated transcription chunks:
for resp in cli.transcribe(system_id, myStreamIterator):
if resp["hyp_novar"] != "":
sys.stdout.write("%s " % resp["hyp_novar"].strip())
if resp["eos"] == True:
sys.stdout.write("\n")
sys.stdout.flush()
Please note that consolidated transcription chunks (resp["hyp_novar"]
) are
be delivered with far more latency than non-consolidated, ongoing (live) ones
(given by resp["hyp_var"]
). However, these latter chunks grow and change
as new incoming audio data is processed, until the system
decides to consolidate. Please note that resp["eos"]
is set to True
when the system outputs
a consolidated end-of-sentence (eos) chunk.
An easy way to test the service, or in case you want to use it in an off-line fashion, is to stream a raw audio file compilant with the audio specifications, that is: PCM, single channel, 16khz sample rate, 16bit little endian.
def myStreamIterator():
with open("test.wav", "rb") as fd:
data = fd.read(250)
while data != b"":
yield data
data = fd.read(250)
However, if you want to perform a more realistic test, you can try to stream your own voice using a microphone and pyAudio:
import pyaudio
def myStreamIterator():
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 20
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
yield data
stream.stop_stream()
stream.close()
p.terminate()
If your audio file or stream does not comply with these specs, you should consider to transform it before delivering it to the service, i.e. by using pydub.AudioSegment
In adittion, two interesting features of the underlying ASR systems can be used
in your myStreamIterator()
function.
The first one is to send the system an end-of-sentence (eos) signal, thus
forcing the consolidation of the ongoing non-consolidated hypotheses. This can
be easily done by doing yield None
, this is, sending an empty package. As
soon as the system processes an empty package, it will return a hyp['no_var']
containing the latest consolidated chunk, along with hyp['eos'] = True
.
The second one is to inject any string at some point of the audio stream. The
ASR system will output that string unchanged and properly time-aligned with the
sorrounding transcribed speech utterances. Just do, e.g. yield "My Awesome
String"
.
This feature can be useful in re-speaking scenarios for live TV
broadcasting, to insert punctuation signs to the output text stream via
keystrokes, to insert speaker changes, etc.
Classes
class MLLPStreamingClient (server_name, server_port, api_user=None, api_secret_key=None, server_cert_file=None, debug=False)
-
Creates a MLLPStreamingClient instance.
Parameters:
- server_name: gRPC API server hostname or IP address.
- server_port: gRPC API server port.
- api_user: TTP API username (optional, if server does not require user auth).
- api_secret_key: TTP API user secret key (optional, if server does not require user auth).
- server_cert_file (optional): use SSL encryption, by providing an SSL certificate file of the gRPC API server.
- debug (optional): enable/disable debug mode.
Methods
def getTranscribeSystemsInfo(self)
-
Implements the gRPC GetTranscribeSystemsInfo call, to get information about all available streaming transcription (ASR) systems.
Returns a list of TranscribeSystemsInfoResponse in JSON format (python dictionaries).
This method automatically calls to getAuthToken(), if the MLLPStreamingClient object does not store a valid auth token.
def removeASRNode(self, host, port)
-
Admin method: implements the gRPC RemoveASRNode call.
def listASRNodes(self)
-
Admin method: implements the gRPC ListASRNodes call.
def getAuthToken(self)
-
Implements the gRPC GetAuthToken call, to get a valid auth token for nextcoming gRPC calls.
Explicitly calls to GetAuthToken with the API user name and API user secret provided to the class constructor. Auth token and its lifetime is saved in this instance for the nextcoming gRPC calls.
Returns an AuthTokenResponse in JSON format (python dictionary).
Raises Exception if the server returns an error code, typically when authentication fails.
def addASRNode(self, host, port)
-
Admin method: implements the gRPC AddASRNode call.
def transcribe(self, system_id, audio_stream_iterator)
-
Implements the gRPC Transcribe call, to transcribe a stream of raw audio samples using a streaming transcription (ASR) system.
Parameters:
- system_id: Transcription (ASR) system identifier (obtained in a previous call to getTranscribeSystemsInfo()).
- audio_stream_iterator: a iterator or a generator method providing chuncks of raw audio data (i.e. wav) in the following format: single channel (mono), 16khz, signed 16bit little endian.
This method is a generator of TranscribeResponse gRPC messages in JSON format (python dictionaries), thus providing as output a continuous stream of transcriptions for the incoming audio stream feeded by audio_stream_iterator.
This method automatically calls to getAuthToken(), if the MLLPStreamingClient object does not store a valid auth token.