zachary.com

personal pages

All ad proceeds donated to charity.

XVoice: Linux Text To Speech Recognition and Integration

Note!

As of version 0.4, I have stopped development of XVoice. However, this project has grown and continues to be supported by a number of folks over at http://xvoice.sourceforge.net.

XVoice

Enables continuous speech to text dictation for many X applications.

Download Version 0.4, May 24, 1999. Newer versions are available from Sourceforge -- this one is left here for historical reasons.

XVoice Image

From the README...

XVoice is a simple hack. It will accept continuous speech input from IBM's ViaVoice SDK for Linux (which is distributed seperately), and then re-targets the resulting text at many X applications.

To accomplish this retargeting, XVoice synthesizes X key press events. Many application will happily accept synthesized events, but some will not, deeming them -- correctly -- to be a security risk. You should test XVoice with an RXVT terminal window as a target to make sure that it's working. Once it works correctly with an rxvt, try it with other applications.

Security Information

It is important to note that XVoice's use of synthetic events exploits a possible security hole in many X applications. Obviously, if XVoice can send events to your app, so can a malicious process. You should take care to not allow anonymous connections to your X server (i.e. don't just blindly execute xhost +). In general, if you have the default X security that ships with, for example, RedHat 6.0, and if you do not have other users on your machine, you should be fine. I must stress that XVoice itself does not present any additional security risks, it just hilights an existing problem, and can somewhat broaden that problem due to users' allowing synthetic events. Caveat Emptor.

XVoice is known to work with XTerm, RXVT, WordPerfect, XFMail, XEmacs, and others. GTK-based apps seem to work too. Qt-based apps (at least the two I've tried) do not. XTerm does not work by default -- you have to bring up the ctrl+left-click menu, and turn on "Allow SendEvents". I'd suggest trying things out using rxvt or XEmacs.

In order to get emacs to work, you _must_ evaluate the following elisp:

(setq x-allow-sendevents t)

Building

XVoice requires X, ViaVoice SDK, and GTK+ to build. Some of the code is based on the IBM ViaVoice SDK sample "gtkhello". If you can build that sample program, you can build XVoice. See the Makefile for local settings.

Running

"xvoice -h" will show you help on running the application.

"xvoice -m" start XVoice with the microphone on.

Once started, XVoice presents a list of all top-level X applications. Select an application, and then choose "Dictate" from the "Target" menu (or just use voice commands -- see below). You will now be able to do continuous dictation into the target application. To stop dictation, say "stop dictation".

Voice Commands for XVoice

While the mic is on, and if you're not dictating into any application, you can control XVoice with a simple vocabulary:

While dictating into an application, speaking "stop dictation" will return to XVoice command mode. Speaking "correction" will erase the last dictated word or punctuation. You may erase words back to the beginning of the dictation session. Speaking "new paragraph" will enter a single CR for you (i.e. "enter").

Copyright

Long Version:

XVoice is Copyright (c) 1999 by David Z. Creemer. All rights not granted in the LICENSE file are reserved. XVoice is distributed subject to the terms explained in the LICENSE file.

Short Version: GPL.


This page last modified Friday 09 December, 2005 by David Creemer
All content Copyright 2003-2005, David Z Creemer