Announcing Amazon Transcribe streaming transcription support in the AWS SDK for Ruby

Amazon Transcribe streaming transcription enables you to send an audio stream, and with a single API call, receive a stream of text in real time. We’re excited to announce support for the #start_stream_transcription API with bidirectional streaming usage in the AWS SDK for Ruby.

Before calling #start_stream_transcription

To use the Amazon Transcribe #start_stream_transcription API, you need to have http-2 gem and aws-sdk-transcribestreamingservice gem available, as follows.


gem 'http-2', '~> 0.10'
gem 'aws-sdk-transcribestreamingservice', '~> 1.0'

The Amazon Transcribe #start_stream_transcription API enables you to send an audio stream and receive a stream of text in real time. Although the AWS SDK for Ruby supports all Ruby versions later than 1.9.3, this API is streamed over the HTTP2 protocol. This means to use the API, you need to have Ruby version 2.1 or later.

To check your Ruby version, run the following.


ruby -v

Currently, Amazon Transcribe supports both 16 kHz and 8 kHz audio streams (WAV, MP3, MP4, and FLAC) in 16-bit linear PCM encoding. Make sure your audio stream is under supported sample rates and within supported encoding before trying out the API, or you might get back empty transcripts or bad request exceptions.

You can find more FAQs on Amazon Transcribe streaming transcription here.

#start_stream_transcription API usage pattern

Let’s walk through the key parts for making an async API call from an async client and event stream handlers, and a complete example of using the API.

Introduction to AsyncClient

Following the nature of HTTP2, the AWS SDK for Ruby introduces AsyncClient for streaming APIs, compared to Client (which you might be familiar with) for API calls over HTTP1.1.


require 'aws-sdk-transcribestreamingservice'

async_client = Aws::TranscribeStreamingService::AsyncClient.new(region: 'us-west-2')

# List all available HTTP2/Async operations
async_client.operation_names
# => [:start_stream_transcription]

Introduction to input and output event stream handlers

For a bidirectional streaming API, you need to provide an :input_event_stream_handler for signaling audio events, and an :output_event_stream_handler registered with callbacks to process events immediately when they arrive.


input_stream = Aws::TranscribeStreamingService::EventStreams::AudioStream.new
output_stream = Aws::TranscribeStreamingService::EventStreams::TranscriptResultStream.new

You can find all of the available event streams for those handlers, and documentation about them, at Aws::TranscribeStreamingService::EventStreams.

Before we make the request, let’s take a closer look at those handlers. For handling events in responses, although you still can #wait or #join! for a final sync response, you get the most benefit out of streaming APIs on HTTP2 by registering callbacks on output_stream to access events with no delay.


#  Print out transcripts received
output_stream.on_transcript_event_event do |event|
	unless event.transcript.results.empty?
		event.transcript.results.each do |result|
			result.alternatives.each {|alter| puts alter.transcript.inspect }
		end
  end
end

# Raise an error on bad request exception
output_stream.on_bad_request_exception_event do |exception|
	raise exception
end

# Alternatively, watch all events that arrive
# output_stream.on_event {|event| # do something}

# Callbacks for error events (unmodeled exceptions)
# output_stream.on_error_event {|error| # Aws::Errors::EventError }

You can find all of the available callback methods for output_stream in the Aws::TranscribeStreamingService::EventStreams::TranscriptResultStream documentation.

Then, when it comes to using input_stream, you can #signal audio events after initializing an async request.


input_stream.signal_audio_event_event(audio_chunk: ...# audio bytes ... )

Calling the API

For a complete example to demo, we’re using an AWS Podcast audio here to show how we use the #start_stream_transcription API to get real-time transcripts streamed back.

Let’s pick AWS Podcast #285, which talks about AWS Lambda support for the native Ruby runtime and more.

First, download the file and convert the audio to 16kHz rate with 16-bit linear PCM encoding, with the name AwsPodCast285.wav.

Now we’re set to call the API. Let’s create a demo.rb file as follows.


...
# Omit async client initialization
# and input/output event stream handler initialization part

# Have an audio file
audio_file = File.new('AwsPodCast285.wav', 'rb')

# Register callbacks
output_stream.on_transcript_event_event do |event|
	unless event.transcript.results.empty?
		event.transcript.results.each do |result|
			result.alternatives.each {|alter| puts alter.transcript.inspect }
		end
  end
end
output_stream.on_bad_request_exception_event do |exception|
	input_stream.signal_end_stream
end

# Make an async call
async_resp = async_client.start_stream_transcription(
	language_code: "en-US",
	media_encoding: "pcm",
	media_sample_rate_hertz: 16000,
	input_event_stream_handler: input_stream,
	output_event_stream_handler: output_stream
)
# => Aws::Seahorse::Client::AsyncResponse

# Signaling audio chunks
while !audio_file.eof? do
	input_stream.signal_audio_event_event(audio_chunk: audio_file.read(30000))
  sleep(1)
end
sleep(0.5)
input_stream.signal_end_stream
audio_file.close

# You can call #join! after some time passes, which would end the stream immediately
resp =  async_resp.wait
# => Aws::Seahorse:Client::Response

Running the code produces the following.


"This."
"This is"
"This is a"
"This is EP."
"This is episode"
"This is Episode two"
"This is episode too, huh?"
"This is Episode two hundred"
"This is Episode two hundred and eight."
"This is Episode two hundred and eighty"
"This is Episode two hundred and eighty five."
"This is Episode two hundred and eighty five of"
"This is Episode two hundred and eighty five of the"
"This is Episode two hundred and eighty five of the eight"
"This is Episode two hundred and eighty five of the tub."
"This is Episode two hundred and eighty five of the W."
"This is Episode two hundred and eighty five of the W s."
"This is Episode two hundred and eighty five of the W s."
"This is Episode two hundred and eighty five of the Ws Po'd."
"This is Episode two hundred and eighty five of the Ws podcast."
"This is Episode two hundred and eighty five of the Ws podcast."
"This is Episode two hundred and eighty five of the Ws podcast Real"
"This is Episode two hundred and eighty five of the WS podcast released on"
"This is Episode two hundred and eighty five of the WS podcast released on DIS"
"This is Episode two hundred and eighty five of the Ws podcast released on December."
"This is Episode two hundred and eighty five of the Ws podcast released on December twenty."
"This is Episode two hundred and eighty five of the Ws podcast released on December twenty third."
"This is Episode two hundred and eighty five of the Ws podcast released on December twenty third, twenty."
...

For full documentation of how to use this API, see the AWS SDK for Ruby API Reference.

Additional notes

Due to the nature of the HTTP2 protocol, request and response happens in parallel, and multiple streams share a single connection. Although you have full control of the speed of signaling audio events from input event streams, when the signal speed is too fast, with huge audio chunks, the bandwidths left for responding to events could be narrowed. To get the most from bidirectional streaming, we recommend a balanced pace in signaling events at input streams.

We recommend calling #signal_end_stream at the input event stream handler after audio event signaling is completed as a good practice. It sends a clear “end” stream signal to the server side. Some services might be waiting for this “end” stream signal to complete stream communication. If no further audio event is sent and no end stream is signaled, a :bad_request_exception event might also be returned.

As you might have noticed, different from sync HTTP1.1 API calls, the AsyncResponse object is returned immediately once an async API call is made. There are two methods for syncing an AsyncResponse: #wait and #join!. The #wait method would wait on the request until the stream is closed, which can take minutes or even hours (depending on input event signaling). However, when #join! is called, it would end the stream immediately with no delay.

We also provide #close_connection and #new_connection methods for an AsyncClient, as connection will be shared across multiple requests (streams), we recommend calling #close_connection when you finished syncing all async responses. By default connection will be closed after 60 sec if no errors occurred when no data is received, you can configure this value by :connection_timeout.

Final thoughts

We walked through async API usage in this blog post and provided some best practices. Although async API usage is new and different from sync API calls in the AWS SDK for Ruby, it’s bringing streaming benefits for many use cases. Free feel to give it a try and let us know if you have any questions.

Feedback

Please share your questions, comments, and issues with us on GitHub. You can also catch us in our Gitter channel.

AWS Developer Tools Blog

Announcing Amazon Transcribe streaming transcription support in the AWS SDK for Ruby

Resources

Follow