Making node speak

In today's episode, we'll learn how to make node.js speak using the built-in windows TTS (text-to-speech) system.

[The code samples shown here + all modules are available at: https://github.com/nadavbar/node-windows-tts]

Obtaining a speech stream


The simplest way for producing a stream of speech from a given text, would be to use the Windows.Media.SpeechSynthesis API. We would like to create a new SpeechSyntesizer object, and use its synthesizeTextToStreamAsync method, which gets text and outputs a stream in WAVE format, containing the synthesized text.

Ok, this is quite simple, we'll use NodeRT here, and generate the windows.media.speechsynthesis NodeRT module, which will enable us to use this API from node.js in the following way:

var speech = require('windows.media.speechsynthesis');  
var synth = new speech.SpeechSynthesizer();  
synth.synthesizeTextToStreamAsync('Hello Node!', function (err, speechStream) {  
  // yay! we have a stream...
});

When synthesizeTextToStreamAsync completes, speechStream will hold a reference to a SpeechSynthesisStream object, which will hold the synthesized text.

Converting WinRT stream to a node.js stream

In order to be able to play the synthesized speech, we would first need to save it to a file.

In order to do that, we will use the node-streams module , which was devloped alongside NodeRT in order to provide a mechanism for converting/wrapping WinRT streams to node.js streams in an efficient manner. Using nodert-streams, one can wrap a WinRT stream and use it as a regular node.js stream object.

In order to use nodert-streams, run from your cmd line prompt:

npm install nodert-streams  

(If the above command fails, make sure that node-gyp and all of its perquisites are installed).

Finally, you'll also need to generate the windows.storage.streams module using NodeRT (windows.storage.streams is consumed by nodert-streams).

The code snippet below uses nodert-streams as well as the fs and path modules in order to save the speech stream to a file.

var speech = require('windows.media.speechsynthesis');  
var nodert_streams = require('nodert-streams');  
var fs = require('fs');  
var path = require('path');

// data will be saved to speech.wav in the script's directory
// __dirname will always contain the directory of the current script
var filePath = path.join(__dirname, 'speech.wav');

var synth = new speech.SpeechSynthesizer();  
synth.synthesizeTextToStreamAsync('Hello Node!', function (err, speechStream) {  
  // create an input stream wrapper for the WinRT stream
  var st = new nodert_streams.InputStream(speechStream);
  var fileStream = fs.createWriteStream(filePath);
  fileStream.on('close', function () {
    // yay! the speech data was saved to a file
  });

  st.pipe(fileStream);
});

Playing the audio file

In order to play audio from node.js we will use edge.js, and the code snippet shown in this awesome blog post: http://tomasz.janczuk.org/2014/06/playing-audio-from-nodejs-using-edgejs.html.

First, we will install edge.js by running the following command from the cmd-line prompt:

npm install edge  

And the following code snippet will produce a function that will play the contents of an audio file in a given path:

var edge = require('edge');  
var play = edge.func(function() {/*  
     async (input) => {
         return await Task.Run<object>(async () => {
             var player = new System.Media.SoundPlayer((string)input);
             player.PlaySync();
             return null;
         });
    }
*/});

Playing the synthesized speech

Putting everything together, the javascript code shown bellow will take the text "Hello Node!" and play it to your speakers:

var speech = require('windows.media.speechsynthesis');  
var nodert_streams = require('nodert-streams');  
var fs = require('fs');  
var path = require('path');  
var edge = require('edge');

// data will be saved to speech.wav in the script's directory
// __dirname will always contain the directory of the current script
var filePath = path.join(__dirname, 'speech.wav');

var play = edge.func(function () {/*  
     async (input) => {
         return await Task.Run<object>(async () => {
             var player = new System.Media.SoundPlayer((string)input);
             player.PlaySync();
             return null;
         });
    }
*/});

var filePath = path.join(__dirname, 'speech.wav');

var synth = new speech.SpeechSynthesizer();

synth.synthesizeTextToStreamAsync('Hello Node!', function (err, speechStream) {  
  // create an input stream wrapper for the WinRT stream
  var st = new nodert_streams.InputStream(speechStream);
  var fileStream = fs.createWriteStream(filePath);
  fileStream.on('close', function () {
    play(filePath);
  });

  st.pipe(fileStream);
});

After running the code above, you should hear something like this:

The above code sample is also available here.

Adding some flare using SSML

The SpeechSynthesizer can also synthesize speech by accepting input in the format of Speech Synthesis Markup Language (SSML), using the synthesizeSsmlToStreamAsync method. Using SSML, we can control different characteristics of the synthesized voice, like speed, pitch, and more.

In order to create the SSML input, we will use the ssml module, which provides functional API for creating SSMLs.

First, run from the cmd line prompt:

npm install ssml  

For example, the following code snippet will produce an SSML with a varying pitch:

var ssml = require('ssml');  
var ssmlDoc = new ssml();  
ssmlDoc.prosody({ pitch: '+20st', rate: 'slow' })  
    .say('Hello')
     .break(300)
     .prosody({ pitch: '-4st', rate: 'slow' })
     .say('Node!');

And the result will be the following SSML:

<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">  
    <prosody pitch="+20st" rate="slow">
      Hello
      <break time="300m s" />
      <prosody pitch="-4st" rate="slow">Node!</prosody>
    </prosody>
</speak>  

The code example below will use the synthesizeSsmlToStreamAsync instead of the synthesizeTextToStreamAsync method in order to synthesize speech from SSML, and play it:

var speech = require('windows.media.speechsynthesis');  
var nodert_streams = require('nodert-streams');  
var fs = require('fs');  
var path = require('path');  
var edge = require('edge');  
var ssml = require('ssml');

// data will be saved to speech.wav in the script's directory
// __dirname will always contain the directory of the current script
var filePath = path.join(__dirname, 'speech.wav');

var play = edge.func(function () {/*  
     async (input) => {
         return await Task.Run<object>(async () => {
             var player = new System.Media.SoundPlayer((string)input);
             player.PlaySync();
             return null;
         });
    }
*/});

var ssmlDoc = new ssml();  
ssmlDoc.prosody({ pitch: '+20st', rate: 'slow' })  
    .say('Hello')
     .break(300)
     .prosody({ pitch: '-4st', rate: 'slow' })
     .say('Node!');

var filePath = path.join(__dirname, 'speech_ssml.wav');

var synth = new speech.SpeechSynthesizer();

synth.synthesizeSsmlToStreamAsync(ssmlDoc.toString(), function (err, speechStream) {  
  if (err) {
    return console.error(err);
  }
  // create an input stream wrapper for the WinRT stream
  var st = new nodert_streams.InputStream(speechStream);
  var fileStream = fs.createWriteStream(filePath);
  fileStream.on('close', function () {
    play(filePath);
  });

  st.pipe(fileStream);
});

After running the code above, you should hear something like this:

The above code sample is also available here.

That's all for now..See you next time!