API Development

Real-Time Speech Recognition in iOS 10 using Hyperloop

Speech Recognition Overview

Speech recognition in products is already big and it is growing fast. From Siri, Alexa, Google Now, Cortana, and others comes the ability for apps and devices to be able to assist users in a multitude of creative ways. While hands-free control using voice has gathered most of the attention, the introduction of new APIs has also opened the door for developers to use these mature speech recognition features within apps as well.

Speech Recognition in iOS

With iOS 5, Apple introduced Siri and Keyboard Dictation which gave users a dictation button on their keyboard and allowed users to dictate speech into text for text input controls. While this was a huge step forward, it had a lot of limitations and did not have an API that was available to developers.

With iOS 10, Apple has released a new API that supports continuous speech recognition and makes it possible to recognize and transcribe real-time speech as well as audio from media files (both audio and video). With Appcelerator Titanium and Hyperloop, developers can use these features in apps that run on iOS 10 devices.

The speech recognition API can provide dictation progress updates as it is listening to speech. It also provides developers with multiple translations of the speech and an associated confidence level for each one. With most iOS 10 devices, speech recognition does require a connection to the internet as it uses Apple’s servers to process the speech.

Getting Started with Hyperloop

Hyperloop enables Appcelerator Titanium developers to talk directly to Objective-C, Swift, or Java code, right from within JavaScript. To use these features, simply create a new Titanium app and enable the Hyperloop service.

Speech Recognition with Hyperloop

Although we will cover how to use iOS speech recognition manually using Hyperloop in some detail, the example code in this article is part of a Hyperloop module ti.speech that makes all of this even simpler.

Permissions

To use the speech recognition feature does require your app to request additional permissions to work correctly. First, you will need to ask the user for permission to use the new speech recognition feature. It is required to add NSSpeechRecognitionUsageDescription to the iOS plist section of your tiapp.xml file along with a description of why the app needs this permission.


<key>NSSpeechRecognitionUsageDescription</key>
<string>Can we parse your spoken words?</string>

Next, if your app will be listening to real-time speech, you will need to request permission to record audio using the microphone as well. This also requires modifying the iOS plist section of your tiapp.xml file to add NSMicrophoneUsageDescription along with a description of why the app needs this permission.


<key>NSMicrophoneUsageDescription</key>
<string>Can we use the microphone for real-time speech recognition?</string>
Create the Speech Recognizer

Creating an instance of SFSpeechRecognizer will give you the speech recognizer to use in your app. It can be initialized with or without a locale. For a pre-recorded audio file, you might want to initialize with the locale for the language spoken on the file. For real-time speech, it is probably best to initialize with no locale (which uses the default locale for the device).


var SFSpeechRecognizer = require('Speech/SFSpeechRecognizer');
var locale = 'en_US';
var speechRecognizer;

if (locale) {
    speechRecognizer = SFSpeechRecognizer.alloc().initWithLocale(NSLocale.alloc().initWithLocaleIdentifier( locale));
} else {
    speechRecognizer = new SFSpeechRecognizer();
}
Check Availability

The app should check to see if speech recognition is available before using the API. Although the speech recognizer might be available for the user’s device and selected locale, other situations, such as no internet connection, could make the speech recognizer temporarily unavailable. Calling the function isAvailable on the speech recognizer will give you the availability status.


speechRecognizer && speechRecognizer.isAvailable();
Requesting Authorization

Your app will need to request permission to use the speech recognition feature as well as permission to record audio (for real-time speech recognition). The prompts shown to the user will use the descriptions that were set by adding the plist entries to your tiapp.xml file.

Requesting permission to use Speech Recognition

Your app must request authorization from the user to use any of the Speech Recognition API. If the user agrees to give the app permissions to use the speech recognition feature, the app can then safely use the pre-recorded features of the API. Real-time audio requires an additional permission (as seen in next section). There are four different values that a speech recognition authorization status can have as shown below.


SFSpeechRecognizer.requestAuthorization(function(status) {

    switch (status) {
        case Speech.SFSpeechRecognizerAuthorizationStatusAuthorized:
            // User gave access to speech recognition
            ...
            break;

        case Speech.SFSpeechRecognizerAuthorizationStatusDenied:
            // User denied access to speech recognition
            ...
            break;

        case Speech.SFSpeechRecognizerAuthorizationStatusRestricted:
            // Speech recognition restricted on this device
            ...
            break;

        case Speech.SFSpeechRecognizerAuthorizationStatusNotDetermined:
            // Speech recognition not yet authorized
            ...
            break;
    }
} );
Requesting permission to record audio

For real-time speech recognition, your app must request authorization from the user to record audio using the microphone. If the user agrees to give the app permission, you can then use the real-time audio features of the speech recognition API (assuming the user also granted permission for speech recognition). There are three possible values that can be returned as shown below.


var AVAudioSession = require('AVFoundation/AVAudioSession');
var audioSession = new AVAudioSession();

audioSession.requestRecordPermission(function(status) {

    switch (status) {
        case AVFoundation.AVAudioSessionRecordPermissionGranted:
            // Recording permission has been granted.
            ...
            break;

        case AVFoundation.AVAudioSessionRecordPermissionDenied:
            // Recording permission has been denied.
            ...
            break;

        case AVFoundation.SFSpeechRecognizerAuthorizationStatusRestricted:
            // Recording permission has not been granted or denied. This typically means that permission has yet to be requested, or is in the process of being requested.
            ...
            break;
    }
} );
Speech Recognition for Pre-Recorded Audio

To use speech recognition with pre-recorded media files (either audio or video), you will first need to create a path to the media file and create an instance of SFSpeechURLRecognitionRequest.


var NSBundle = require('Foundation/NSBundle');
var NSURL = require('Foundation/NSURL');
var NSURL = require('Foundation/NSURL');
var SFSpeechRecognitionRequest = require('Speech/SFSpeechRecognitionRequest');

var url = input.split( '.' ); 
var soundPath = NSBundle.mainBundle.pathForResourceOfType(url[0], url[1]);
var soundURL = NSURL.fileURLWithPath(soundPath);
var request = SFSpeechURLRecognitionRequest.alloc().initWithURL(soundURL);
request.shouldReportPartialResults = true;  // allows progress updates

Using the request, you can then create an instance of SFSpeechRecognitionTask which will use a callback function to report progress results.


var SFSpeechRecognitionTask = require('Speech/SFSpeechRecognitionTask');

recognitionTask = speechRecognizer.recognitionTaskWithRequestResultHandler(request, function(result, error) {
    if (error) {
        // Handle error
    } else if (result.isFinal()) {
        // Do something with result.bestTranscription.formattedString
    }
}

To stop speech recognition When using a file as the input source, we will need to cancel the recognitionTask.

Note: Calling cancel will put it in a “cancelling” state but the task will continue until it reaches the end of the file or the time limit set by Apple (usually about one minute).


if (recognitionTask) {
        recognitionTask.cancel();
}
Speech Recognition for Real-Time Audio

Recognition of real-time audio is similar to that of pre-recorded audio but with some additional steps. You must create an instance of AVAudioEngine and the request will be of type SFSpeechAudioBufferRecognitionRequest.


var AVAudioEngine = require('AVFoundation/AVAudioEngine');
var SFSpeechAudioBufferRecognitionRequest = require('Speech/SFSpeechAudioBufferRecognitionRequest');

var audioEngine = new AVAudioEngine();
var request = new SFSpeechAudioBufferRecognitionRequest();
request.shouldReportPartialResults = true;  // allows progress updates
var recognitionTask = speechRecognizer.recognitionTaskWithRequestResultHandler(request, function(result, error) {
    if (error) {
        // Handle error
    } else if (result.isFinal()) {
        // Do something with result.bestTranscription.formattedString
    }
}

After the recognitionTask task is created, you will have a few necessary steps before you can start the audioEngine.


audioEngine.inputNode.installTapOnBusBufferSizeFormatBlock(0, 1024, audioEngine.inputNode.outputFormatForBus(0), function(buffer, when) {
    request && request.appendAudioPCMBuffer(buffer);
} );

audioEngine.prepare();
var success = audioEngine.startAndReturnError();

Once the audioEngine is running, it can be stopped with the code below. The progress callback will then report that it has finished and include the final transcription.


if (audioEngine.isRunning()) {
    audioEngine.stop();
    request.endAudio();
    audioEngine.inputNode.removeTapOnBus(0);
}

Note: If you failed to set the NSMicrophoneUsageDescription property in your tiapp.xml file, your app will crash when it tries to access audioEngine.inputNode

Limitations of iOS Speech Recognition

Although there are some limitations with iOS speech recognition, you can make your app awesome by being aware of them and writing your code accordingly.

  • Requires iOS 10+
  • Limited number of recognitions per day
  • One minute limit on audio duration
  • Real-time speech recognition requires a device (will not work with simulator)
  • Connection to the internet is required
  • Speech recognition uses a lot of power and data

Any time you are working with speech recognition (on any platform), it is always important to think ahead and anticipate errors that can possibly occur. Designing your app to handle unexpected errors can give your users a smooth and rewarding experience!

Using the ti.speech Hyperloop module

If you want to get started using speech recognition in iOS even quicker, you can take a look at the new ti.speech Hyperloop module. Using this module and code, similar to this below, you can get up and running with speech recognition very quickly. See the example app in the repo for more details.


var TiSpeech = require('ti.speech');
TiSpeech.initialize();

TiSpeech.startRecognition({
    progress: function(result) {
        if (result.finished) {
            console.log(result.value);
        }
    }
} );

You will also have access to many helper functions such as these below in addition to several constants for making working with the API easier.


initialize()
isAvailable()
requestSpeechRecognizerAuthorization()
startRecognition()
stopRecognition()

Wrapping Up

The Speech Recognition API introduced by Apple in iOS 10 is a highly accurate, fast, and easy to use library that can add a lot of benefit to your iOS apps. Appcelerator Hyperloop makes it easy for you, as a developer, to access this and other native APIs and to take your apps to a whole new level!

You can find the repo containing the source code for this article here: hyperloop-modules/ti.speech. Also included is an example app that shows how the iOS 10 speech recognition works with real-time speech, audio files, and video files.

There are a lot of possible opportunities to use Hyperloop with speech recognition in the future. This is but one of the many modules out there that is both written and supported by the growing community of developers using the Appcelerator Platform. Contribute to the community by taking your ideas and making something awesome with them!

  • Adding Android voice recognition to a Hyperloop module
  • Adding support for voice recognition of external media files
  • Integrating with a Natural language processing (NLP) library