Module MLLPStreamingClient.MLLPStreamingClient

The MLLP-TTP gRPC Streaming API Python3 client library (v1.1.0)

The MLLP-TTP gRPC Streaming API Python3 client library can be used to develop your own streaming speech recognition application/backend, or also as a standalone command-line tool to transcribe audio files. It is based on the gRPC framework.

Installation (via a provided .whl file):

pip install MLLPStreamingClient_mllp-${VERSION}-py3-none-any.whl 

Below are shown some examples on interacting with the gRPC Streaming API with this library.

First, we have to import the MLLPStreamingClient library and create a MLLPStreamingClient class instance:

from MLLPStreamingClient import MLLPStreamingClient
cli = MLLPStreamingClient(server_hostname, server_port, api_user, 
                          api_secret, server_ssl_cert_file)

server_hostname, server_port, api_user, api_secret and server_ssl_cert_file are given in the API section of TTP.

Next, and optionally, we can perform a explicit call to the rpc GetAuthToken method, to get a valid auth token for the nextcoming rpc calls:

 cli.getAuthToken()

Please note that if we do not perform explicitly this call, it will be performed implicitly by the library, if necessary.

Then, we check out the available transcription (ASR) systems offered by the service by calling to the GetTranscribeSystemsInfo rpc call:

 systems = cli.getTranscribeSystemsInfo()
 import json
 print(json.dumps(systems, indent=4))

Finally, we pick up one system (system_id) and start transcribing our live audio stream supplied as an iterator method called i.e. myStreamIterator(), while printing only consolidated transcription chunks:

for resp in cli.transcribe(system_id, myStreamIterator):
    if resp["hyp_novar"] != "":
        sys.stdout.write("%s " % resp["hyp_novar"].strip())
        if resp["eos"] == True:
            sys.stdout.write("\n")
        sys.stdout.flush()

Please note that consolidated transcription chunks (resp["hyp_novar"]) are be delivered with far more latency than non-consolidated, ongoing (live) ones (given by resp["hyp_var"]). However, these latter chunks grow and change as new incoming audio data is processed, until the system decides to consolidate. Please note that resp["eos"] is set to True when the system outputs a consolidated end-of-sentence (eos) chunk.

An easy way to test the service, or in case you want to use it in an off-line fashion, is to stream a raw audio file compilant with the audio specifications, that is: PCM, single channel, 16khz sample rate, 16bit little endian.

def myStreamIterator():
    with open("test.wav", "rb") as fd:
        data = fd.read(250)
        while data != b"":
            yield data
            data = fd.read(250)

However, if you want to perform a more realistic test, you can try to stream your own voice using a microphone and pyAudio:

import pyaudio
def myStreamIterator():
    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 16000
    RECORD_SECONDS = 20
    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT,
                     channels=CHANNELS,
                     rate=RATE,
                     input=True,
                     frames_per_buffer=CHUNK)
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        yield data
    stream.stop_stream()
    stream.close()
    p.terminate()

If your audio file or stream does not comply with these specs, you should consider to transform it before delivering it to the service, i.e. by using pydub.AudioSegment

In adittion, two interesting features of the underlying ASR systems can be used in your myStreamIterator() function.

The first one is to send the system an end-of-sentence (eos) signal, thus forcing the consolidation of the ongoing non-consolidated hypotheses. This can be easily done by doing yield None, this is, sending an empty package. As soon as the system processes an empty package, it will return a hyp['no_var'] containing the latest consolidated chunk, along with hyp['eos'] = True.

The second one is to inject any string at some point of the audio stream. The ASR system will output that string unchanged and properly time-aligned with the sorrounding transcribed speech utterances. Just do, e.g. yield "My Awesome String". This feature can be useful in re-speaking scenarios for live TV broadcasting, to insert punctuation signs to the output text stream via keystrokes, to insert speaker changes, etc.

Classes

class MLLPStreamingClient (server_name, server_port, api_user=None, api_secret_key=None, server_cert_file=None, debug=False)

Creates a MLLPStreamingClient instance.

Parameters:

  • server_name: gRPC API server hostname or IP address.
  • server_port: gRPC API server port.
  • api_user: TTP API username (optional, if server does not require user auth).
  • api_secret_key: TTP API user secret key (optional, if server does not require user auth).
  • server_cert_file (optional): use SSL encryption, by providing an SSL certificate file of the gRPC API server.
  • debug (optional): enable/disable debug mode.

Methods

def getTranscribeSystemsInfo(self)

Implements the gRPC GetTranscribeSystemsInfo call, to get information about all available streaming transcription (ASR) systems.

Returns a list of TranscribeSystemsInfoResponse in JSON format (python dictionaries).

This method automatically calls to getAuthToken(), if the MLLPStreamingClient object does not store a valid auth token.

def removeASRNode(self, host, port)

Admin method: implements the gRPC RemoveASRNode call.

def listASRNodes(self)

Admin method: implements the gRPC ListASRNodes call.

def getAuthToken(self)

Implements the gRPC GetAuthToken call, to get a valid auth token for nextcoming gRPC calls.

Explicitly calls to GetAuthToken with the API user name and API user secret provided to the class constructor. Auth token and its lifetime is saved in this instance for the nextcoming gRPC calls.

Returns an AuthTokenResponse in JSON format (python dictionary).

Raises Exception if the server returns an error code, typically when authentication fails.

def addASRNode(self, host, port)

Admin method: implements the gRPC AddASRNode call.

def transcribe(self, system_id, audio_stream_iterator)

Implements the gRPC Transcribe call, to transcribe a stream of raw audio samples using a streaming transcription (ASR) system.

Parameters:

  • system_id: Transcription (ASR) system identifier (obtained in a previous call to getTranscribeSystemsInfo()).
  • audio_stream_iterator: a iterator or a generator method providing chuncks of raw audio data (i.e. wav) in the following format: single channel (mono), 16khz, signed 16bit little endian.

This method is a generator of TranscribeResponse gRPC messages in JSON format (python dictionaries), thus providing as output a continuous stream of transcriptions for the incoming audio stream feeded by audio_stream_iterator.

This method automatically calls to getAuthToken(), if the MLLPStreamingClient object does not store a valid auth token.