December 16th, 2008 No Comments

by Dominique Boucher

NuGram IDE Beta-20081211 just released

The Nu Echo team is pleased to announce the release of a new beta version of NuGram IDE.

Highlights

In addition to a number of bug fixes, the following features have been added, in response to specific requests from users:

  • Grammar content. The type of a grammar (XML, ABNF, GSL) can be associated with specific filename extensions to help the tools properly load grammars. For instance, while the Nuance OSR engine uses the “.gram” extension to designate compiled grammars, this extension is customarily used to designate ABNF grammars with the IBM WebSphere Voice Server recognition engine. The ability to change file extension associations facilitates the use of NuGram IDE in various environments.
  • Translation rules. A very flexible, project-specific mechanism has been added to control how external rule references are interpreted by the various tools (debugging tools, conversion tools, etc.). For example, references to external .abnf grammars (references of the form $<path/grammar.abnf>) can be automatically converted to references of the form $<path/grammar.grxml> at grammar conversion time. Or references to a compiled .gram SLM grammar can be replaced by a reference to a simple ABNF grammar at grammar development time, without having to change the source grammar.
  • Enhanced automatic builder. It is now possible to specify an output folder for the generated XML/GSL grammars, as well as which folders contain the grammars to convert. By putting generated grammars in specific folders, the folders containing the source ABNF grammars do not get polluted by derived resources (that are usually not put under source control - CVS, SVN, and the like since they can be re-created from the source grammar). This also makes it easier to package grammars for the application - the application build process can simply copy a single directory structure without first having to remove some files.
  • Configurable validations. The feedback level of some validation issues can now be configured in the preferences.
  • Enhanced GSL export/conversion. When exporting/converting grammars to GSL, the resulting grammars are now much better formatted.
  • Better conformance with SRGS. Several conformance problems have been fixed.
  • Nuance 9 extensions. The tools can be configured to support the Nuance 9 extensions to SISR (SWI properties).
  • LumenVox extensions. The tools can be configured to properly handle LumenVox phonetic spellings.
  • SISR 2004 Working Draft support. The tools can be configured to support the SISR 2004 Working Draft syntax for rule variables.This feature is especially useful for grammar developers targeting the IBM WebSphere Voice Server recognition engine, which doesn’t support the latest version of the SISR specification.

Download now!

We strongly encourage people to download this new version as soon as possible as it contains many important new features and bug fixes. And as usual, we solicit your feedback to help improve our product and better support your grammar development process.

December 8th, 2008 No Comments

by Dominique Boucher

Refactoring tools and grammar development

Refactoring tools are incredibly popular in the programming community. Most modern programming environments provide refactoring tools of various degrees of sophistication.

But what are refactoring tools? In short, they are tools that modify programs without changing their runtime semantics. In other words, refactoring tools must not introduce an observable difference in the execution of the program. They help abstract common code, change variable names, rename procedures or methods, etc.

Refactoring tools help developers make repetitive code restructuring tasks that would otherwise be highly error-prone if done by hand. Without such tools, even the simplest form of refactoring - renaming a variable in a file - can easily cause unexpected problems if done using a simple search and replace. Now imagine renaming a public method in an object-oriented language, where the method can be invoked from many different places in the whole project source code…

Refactoring applied to speech recognition grammars

Similarly to programming language refactoring tools, grammars refactoring tools help modify grammars without changing the language they accept and the values they return when interpreting sentences. There are a number of common tasks involved when writing speech recognition grammars that can benefit from refactoring tools. Here are a few:

  • Rule renaming. Naming things is hard. I am a programmer myself and I always find it hard to come up with the most precise name for a class, a variable, a procedure, or method. Naming grammar rules is just as hard. The programming environment should make it easy to rename a rule when we find a better name, in such a way that we don’t break the grammar. In other words, the renaming tool must rename the rule definition, as well as all its references (and potentially the root header). But just as important, semantic tags must also be taken into account when renaming a rule. How many times have you forgotten to modify the semantic tag after renaming a rule? A proper refactoring tool must therefore ensure that references to the rule in all semantic tags be modified as well.
  • Slot renaming. Likewise, slot names are often renamed. A renaming tool must ensure that all references in the defining rule as well as the references in other rules be changed at once.
  • Rule extraction. Another common task for the grammar writer is the extraction of a rule expansion to create a new rule. Grammars are often built incrementally. The grammar writer begins by coding a few rules, discovers potential for reuse, and creates new rules encapsulating these reusable parts. If the extracted parts contain semantic tags, it can be tricky (and highly error-prone) to modify them by hand and making sure that the semantic slots computed by the new rule are properly propagated to the referencing rule.

Challenges

SRGS grammars offer a number of important challenges with respect to refactoring tools:

  • They combine two different languages, namely the SRGS language itself for expressing the valid sequences of words, and the semantic tag language. These two languages have very different semantics.
  • The most common semantic tag languages are based on ECMAScript, a highly dynamic scripting language. The refactoring tools must thus understand the ECMAScript language and its various constructs to properly do their job.
  • The semantic tag language can vary from one ASR engine to the other.

Refactoring support in NuGram IDE

The refactorings described above are all supported by NuGram IDE. Moreover, they are aware of the grammar semantic tag language declared by the grammar - they behave differently whether the tag-format header is semantics/1.0 or swi-semantics/1.0 (the Nuance tag format is not yet supported). This, BTW, is the kind of thing that cannot be done by a generic XML editor.

To rename a rule, put the cursor on a rule name (the definition or a reference), and press Alt-Shift-R. You should see something like:

As you can see, all the references that must be changed at once are surrounded by a gray rectangle, even in the semantic tags.

To rename a semantic slot, put the cursor on a reference to the slot and press the same key sequence (Alt-Shift-R):

All the definitions and references will be modified at once when you change the slot’s name (here the semantic tags are in the swi-semantics/1.0 tag format). Note that all the references to the slot will be changed in the other rules as well, not only in the defining rule.

Finally, to extract an expansion in a new rule, simply select the expansion:

and type Alt-Shift-T:

You see that a new private rule has been created (the default visibility for newly created rules can be configured in the preferences), and a new tag has also been created to propagate the slots returned by the new rule to the calling rule.

These were very simple examples. Consider this (somewhat contrived) rule:

If I want to rename the $digit local rule, should the tool also rename the rules.digit property? That’s not clear. If the rule $<special.abnf#digit> is matched, rules.digit will contain the semantic value returned by that rule. Otherwise, it will contain the semantic value returned by the last match to $digit. There is an ambiguity here. The same identifier may refer to two different things.

Fortunately, If I try to rename the $digit rule using NuGram IDE, it won’t blindly attempt to rename the slot. It will instead pop up the following dialog (click to enlarge):

Of course, in practice grammars are rarely that hairy and complex. But refactoring tools must be correct 100% of the time. Otherwise, people would not use them by fear of breaking their programs or grammars.

Finally, note that all NuGram IDE refactoring tools are not only available for plain ABNF grammars, but also for the dynamic extensions as well. It is possible to rename variables, rename macros, and extract macros.

If you think of other repetitive grammar-related tasks that could be automated that way, please let us know. We strongly believe in powerful tools that help make applications more robust!

October 23rd, 2008 No Comments

by Dominique Boucher

SISR support by leading ASR engine vendors

One of our NuGram IDE users recently asked us how well SISRW3C’s specification for semantic tags, is supported by current speech recognition platforms. For the benefit of all, here is the current status for the major players in the field:

IBM WVS SISR April 2003
Loquendo SISR 1.0 compliant (1)
LumenVox SISR 1.0 compliant (although the tag-format header is not standard)
Microsoft OCS 2007 Speech Server SISR 1.0 compliant (1)
Nuance OSR Proprietary semantic language based on ECMAScript
Nuance 8.5 GSL + proprietary semantic language
Nuance v9 SISR 1.0 compliant, with proprietary extensions (SWI objects)
Telisma SISR 1.0 compliant (1)
Voxeo ASR SISR 1.0 compliant

(1) Based on information from the company website, we have not tested it yet.

As we can see, SISR is now prevalent in the latest offerings from the major ASR vendors. This, of course, doesn’t mean that the engine you have to use will support SISR. It’s going to be a while before the current installed base upgrades to SISR-compliant engines.

However, if the engine you need to use happens to give you a choice (e.g., for backward compatibility reasons, Nuance v9 supports both SISR and SWI_semantics), it makes sense to seriously consider using SISR. Your grammars will be much more portable across engines (to a certain extent, of course) and the time taken to master it will be a good investment in the long term.

We should point out that NuGram IDE supports all leading semantic tag formats. What this means is that, for any supported tag format, the tool can compute the semantic interpretation in the exact same way the ASR engine does. So, whether or not you use SISR makes no difference: You can still use NuGram IDE to develop, debug, and test your grammars.

October 15th, 2008 No Comments

by Dominique Boucher

NuGram IDE Beta-20081010 just released

The Nu Echo team is pleased to announce the release of a new beta version of NuGram IDE.

Highlights

In addition to a number of small bug fixes, the following features have been added, in response to specific requests from users:

  • Unified editor. The ABNF and coverage editors have been merged in a single, multi-tab editor.
  • Improved refactoring tools.  The refactoring tools have been enhanced to better support semantic tags. For example, semantic slots can now be renamed. Also, the rule extraction refactoring properly adjusts semantic tags.
  • GSL Importer. Nuance GSL grammars can now be translated to ABNF.
  • Better encoding detection. The environment now uses the proper Eclipse mechanism to detect the encoding of an ABNF file.
  • Comments preservation for imported grammars. When converting grammars from XML form or GSL to ABNF, comments are preserved.
  • Project/folder publishing. Whole grammar hierarchies can be uploaded to a NuGram Server at once. See the online documentation for more details.

NuGram Hosted Server improvements

We also recently upgraded the NuGram Hosted Server with support for the following features:

  • Complete HTTPS support. The login and registration process is now done via secured pages to help protect privacy. Also, the NuGram Server HTTP API now fully supports HTTPS.
  • Grammar Content viewing. Grammars published on the server can be previewed from the grammar browsing page. Just click on a grammar name and see the grammar source code!
  • Account settings. A new Account page let you manage your account settings.

Download now!

We strongly encourage people to download this new version as soon as possible as it contains many important new features and bug fixes (and the previous version will expire on November 1st, anyway, while the new release will expire on April 1st, 2009). And as usual, we solicit your feedback to help improve our product and better support your grammar development process.

October 9th, 2008 No Comments

by Yves Normandin

Use cases for dynamic grammars (part 2)

In the previous post, I talked about the the main motivations for using dynamic grammars and described the most common usage scenarios. Now, let me make all of this somewhat more concrete by providing a bunch of  examples (most of which we’ve used in applications we’ve built over the years).

Let’s start with a few examples of grammars will likely need to be re-generated for every single call:

  • Address capture — In order to capture the address of a caller, an application might first ask for the caller’s postal or zip code and then ask for the address using an address recognition grammar dynamically built based on a list of address records associated to the recognized postal or zip code.
  • Voice dialing — A voice dialing application could use a recognition grammar dynamically generated from the data in a user’s address book. The grammar could support sentences such as “Call John Smith”, “John Smith at home”, “Call John Smith’s cellular”, etc.
  • Personalized bill payee list — In a banking bill payment application, the payee list grammar is dynamically generated based on the list of payees that has been set up by the user.
  • Personalized menu options — There is a growing trend towards applications that are increasingly personalized for each user. In that vein, an application’s main menus could be personalized for each user based either on the user’s past usage patterns or on personalization actually done by the user on the company’s web site.
  • Identity validation — Many applications use security questions to validate the identity of the caller. Based on an identity claim (e.g., a social security number or a telephone number), the application asks the caller to answer security questions based on information contained in the caller’s profile, for instance a mother’s maiden name, the name of a pet, a secret word, etc. In this case, because the range of possible responses would often be too large, some of the recognition grammars need to be dynamically built based on the expected responses.
  • One-step correction — Let’s suppose an address recognition N-best list contains the following hypotheses: “four fifty main street”, “four sixty main street”, and “four fifty-one main street” and let’s suppose the caller has actually spoken the third hypothesis. Suppose also that, when confirming the first hypothesis to the caller, we use a confirmation grammar that covers corrections that the caller is likely to make when being proposed an incorrect choice (e.g., “no, four sixty-one”). In other words, the confirmation grammar is built based on hypotheses found in the recognition result. This would make it possible to recognize the eventual correction and act on it, thereby avoiding unnecessary interactions with the caller and, as a result, contributing to enhanced user experience and success rate.
  • Choose from a user-specific list of reservations/orders/transactions/accounts — For instance, let’s say a client calls in order to cancel a flight reservation. The application retrieves all reservations corresponding to the client and asks the caller to say the departure date or the destination in order to identify the correct reservation. The recognition grammar would, of course, be dynamically built based on information obtained from the retrieved reservations. Another example is someone who calls regarding his electricity bill. If the caller has more than one account (e.g., a condo in the city and a second home by a lake), the application could identify the correct account by asking for the address associated with the bill. In this case, the grammar would be built from the addresses associated with all the caller’s accounts.
  • List navigation — Let’s say a flight reservation application has retrieved a number of flights corresponding to the caller’s criteria and then lists all such flights, followed by the question: “Which flight would you like?”, to which the caller could respond “The 10:35 flight”. The recognition grammar could, once again, be dynamically built based on information contained in the proposed list of flights.

Note that in some of these cases (e.g., voice dialing, personalized bill payee list, or personalized menu options) the new grammars could also have been generated – and possibly compiled – offline, either as soon as the relevant information was changed by the user or as part of a scheduled maintenance process. This would help reduce latency during calls.

Here are examples of dynamic grammars based on data that change slowly over time:

  • Dates — Most date grammars would gain from being dependent on the current date. For instance, in a travel reservation application, a departure date only occurs in the future and the return date should be greater than the departure date. Similarly, a birth date normally occurs in the past. Making date grammars a function of the current date eliminates maintenance problems while maximizing accuracy.
  • Telephone numbers — Telephone number recognition accuracy is significantly higher when the area codes allowed by the grammar are limited to those that actually exist. Unfortunately, the list of area codes continuously evolves. In order to maintain the recognition accuracy as high as possible while making sure that all required phone numbers are supported, the telephone number grammar could be dynamically generated based on a continuously updated list of area codes.
  • Postal or zip codes — Many applications ask for the caller’s postal or zip code. For instance, a citizen calling City Hall in order to inquire about the garbage collection schedule might be asked for his/her postal code in order to appropriately locate the house or apartment. If the recognition grammar is designed to only support valid postal codes, it should be updated periodically in order to account for changes in the list of postal codes.

Finally, here are examples of dynamic grammars that could be used as part of a regular application maintenance process:

  • Bill payee list management — Banks continuously update the list of companies, utilities, municipalities, school boards, etc., available for bill payment through their telebanking application. If the bank wants to let their customers add new payees to their own personal bill payee list using the IVR application, the application needs to use a grammar containing all supported payees.
  • Stock quotes — The companies listed on any stock exchange change continuously as new companies are added and existing companies become delisted. As a result, most stock quote applications come with a regular grammar maintenance service to make sure that the recognition grammars are as current as possible.
  • Mutual funds — Same as stock quotes.
  • Branch location — Possible dynamic grammars used for branch location purposes include: City-specific street intersection grammars and city-specific address grammars.

It’s of course easy to come up with many more examples that are similar to those listed above. If you have used dynamic grammars that you think are interesting or markedly different from those listed above, we’d certainly like to hear about them. And, naturally, if you have used dynamic grammars in the past, we’d really like you to try re-developing some of them with NuGram IDE and tell us what you think.