Designing User Friendly IVR Telephone Voice Systems for All

Please contact me to discuss your IVR, telephony or intelligent agent needs. We have two decades experience in creating and optimising voice interfaces.

Copyright (c) Tim Noonan 1995-2015.

Last Updated: January 2006

An earlier version of this paper was presented at the Computer Telephony and Telemedia Conference organised by Housley Communications in September 1995.


terminology in use is not always consistent. The Australian and New Zealand Interactive Voice Response Standard assists to some extent here, however the following terms will be defined since they are frequently used in this paper.


An interactive voice application


Any IVR system;


Interactive Voice Response;


Any telephone-based application which interactively takes input from callers and returns output in the form of voice or auditory information;


A caller to the system (often termed a user in computing contexts);


the digitally recorded human voice used by the system to convey messages and information to callers of the system;


the person who records system messages into the system – the human behind the system’s “voice”.


After working extensively with speech technologies over the last 12 years, and having a strong reliance on speech and audio information, this paper will discuss some of the fundamental differences the Author has encountered between the presentation of information through auditory rather than visual or spatial channels.

The intent of this paper is to present human factors principles applicable to IVR application design which can result in more intuitive, efficient and pleasant telephonic interactions for your IVR customers.

The recently released updated Australian Standard (AS/NZS 4263) on the user interface design of Interactive Voice Response (IVR) systems is a solid basis for developing your applications. However, this standard is relatively brief and has been written to cater for a diverse range of IVR applications and is therefore general in its recommendations. This paper will discuss some aspects which are covered in the standard, but more importantly, it proposes many other factors which will help make your IVR applications enjoyable and intuitive to use.

Designing an IVR system which is easy and intuitive to use relies much more on common sense than clever programming. If some of the suggestions in this paper seem like common-sense, that is probably because they are. Nevertheless, many systems in current use, though technically commendable, are very difficult to use.

Since IVR systems are developed to serve their callers, they need to be optimised to human ways of thinking and responding. The best IVR systems are sculptured to fit their users; in the long-run this extra effort is far easier, affordable and successful than trying to mould your callers to a cumbersome user interface.


A computer screen is two-dimensional. The user is able to look at any part of the information displayed on the screen at will. Highlighting, fonts and location convey structure and relative importance to different elements of material on the screen.

In contrast, an auditory interface is serial, rather than two-dimensional. Only one word can be heard at a time, and the order in which material is delivered is therefore very significant.

The following guidelines illustrate some of these differences:

  • Always announce the function, and then the key required to activate it. This is almost universal in IVR systems and avoids confusion;

  • The most important or the most commonly selected items in a menu should be presented first in a list, so that the caller does not need to listen for too long, in order to find his/her desired choice;

  • Messages need to be kept short, and should include some prominent key words. (verbal emphasis of key words replaces highlighting on a computer screen);

  • Restrict the maximum number of specific items in a menu to between three and five wherever possible. (Since the caller cannot just glance back to review choices, this is all that can be held in the short-term memory of most callers);

  • Use silence to convey structure to your callers. Short pauses between menu items, slightly longer pauses between menus. Avoid long silences, as these convey no useful information to callers and can lead to user concern;

  • Use careful wording, tone of voice, audible tones and logical sequencing of information in order to convey context, errors, menu structure and the relative importance of presented information. Prompt for spoken input with a brief tone, use upward inflections for questions and the like. (In a screen-based application, layout, colour and highlighting serve this role);

  • Use terms and metaphors which relate to the telephone and spoken communications. (Remember that many IVR callers may not have ever used a computer, so you should not assume a knowledge of computing concepts and terms);

  • Confirm choices verbally so the person is confident about what is happening and where they are being taken in the application. (Unlike a computer application where status lines and layered windows remind the user of their location in the application, the IVR system needs to convey this information actively to a caller);

  • Avoid using two-dimensional data structures such as tables and two-dimensional cursor movement through information. That is, avoid the arrow-key metaphors. Instead, structure your information into items or records that can be heard one-at-a-time. Allow the caller to move back to the previous item, to hear the current item again, or skip to the next item in the list. You might provide a shortened version of the item, and nominate a key for further information on that item.

Your system should be self-documenting through well scripted prompts and on-line help. Callers might not have a manual with them for the system, or they may be doing something (such as driving) which means they cannot refer to printed documentation. Because preparing help material can be tedious, this also encourages designers to make the system more intuitive and less complex.

By keeping to the guidelines just listed, – your help scripts will be more straight-forward.

Designing a system which is not reliant on printed information also means that your system is fully useable by people who are unable to read print including people who are blind, or those with other print disabilities. In addition, many people from a non English-speaking background may be able to speak and understand English, but may be unable to read English comfortably.


In the same way that a good public speaker finds out about her or his audience before making a presentation, a good IVR designer needs to know about her or his callers before designing an IVR system. The user interface of your system needs to cater well for the majority of your callers while still being useable by all callers.

It is sometimes possible to develop a simple prototype system without all the functionality of the final product. You can then bring in some sample callers to give it a try.

If you have a large number of callers (many at different levels of ability) then having different prompting levels may also be desirable. Novices may not be offered all choices while advanced callers should be allowed to move about a relatively complex menu structure rapidly with very brief prompts.

Always allow users to make selections without having to listen to all messages. This allows experienced users to move through the system quickly and saves you resources.

IVR systems usually need to be relatively simple in design if there are a large number of diverse callers.

The best place to start when scoping the features and complexity of the IVR service is by identifying those information requests which are simple, but which take up a large percentage of your staff resources. Quite complex systems can be developed if your caller group is clearly targeted and you can make assumptions about their abilities. Be careful here, though, as there may be pressure at a later date to make the system available to a wider range of callers.

One relatively complex system the author worked on was for the Land Titles Office (LTO). LTO wanted lawyers and paralegal staff to be able to conduct property searches via the phone and receive a fax of their search results. There are a variety of classes of plans that can be selected and some contain numbers only, others letters and numbers. Designing a system allowing efficient entry of all possible searches resulted in a relatively complex menu structure as well as the need for unambiguous entry of alpha-numeric input via the telephone keypad. This system would not be suitable for use by the general public; but it very adequately met the requirements of LTO for their experienced title-search customers.


Developing IVR applications is much more about understanding your customers, your business and the way callers use telephones than it is about programming computers. Your software developers are a critical component of the development process, however, you should ensure that the customer support staff of your organisation have a major role too! Customer Service staff know the kinds of things callers want to do and want to know.

It can also be beneficial to involve independent specialists experienced in IVR design and human factors research during the various development stages of the project. This is often more critical with IVR projects, because most organisations have had minimal experience with IVR applications, even though they may have had quite extensive experience in developing screen-based applications.

The investment of having your system assessed by people experienced in auditory user interface design issues can save your organisation a lot of time, uncertainty, expense and lost productivity due to system re-design and re-scripting.

A well designed system will also lead to users learning the system faster, less user frustration and strong uptake of your service instead of them insisting on expensive human-based interactions.


AS/NZS 4263 is a “voluntary” standard – which means there is no authority like Austel demanding that every IVR system comply with it. But it makes good sense to follow the standard as far as possible, unless you have a good reason to deviate from particular recommendations.

IVR systems will have a greater chance of success and acceptability if all systems share some commonality, since experience has shown that consistent and predictable human interfaces benefit users through faster learning, greater productivity, fewer errors, greater satisfaction and faster acceptance. And therefore, greater customer acceptance.

It is possible for your system to comply to the latest revision of the Australian and New Zealand Standard (AS/NZS 4263) and still retain a unique identity which clearly differentiates it from other products on the market.

The first interim revision of the Australian Standard was released in December 1994 and the expanded update early in 1997. The standard has two types of clauses: “shall” clauses (things that must be adhered to if you wish to comply with the standard) and “should” clauses (things you should do where possible).

This standard has very few “shall” clauses and many “should” clauses. As a designer, this is to your advantage. It means that when in doubt about how to do something the standard provides strong recommendations. However, when you have sound reasons for implementing your product in different ways, in most cases you have the scope to do so. Therefore, you should adhere to all the “shall” requirements, adhere to as many of the “should” recommendations as possible, and be consistent within your application where you need to develop your own approaches to user interface facilities.

In the case of the standard, the Committee gave high priority to compatibility with the many existing conventions already in use in the Industry. Next, the Committee dealt with those critical aspects of user interface that were not consistent across existing applications, including exiting the system, returning to the previous menu and providing yes/no responses to the system.

Other recommendations – such as the recommended maximum number of items in a menu – were based on cognitive psychologist’s understanding of human information processing and memory.

Clause 2.6 of the Standard (use of * key) deserves some explanation. One of the most difficult challenges for the Committee was dealing with use of the * key. On the one hand there were a vast number of existing systems which use the * as a clear-field/cancel/back-up command (similar to the key on a computer keyboard). On the other hand, there was the draft ISO Standard for voice messaging which proposes that the * key be used as a “shift” key, that is, a prefix key to a menu of special functions that should be available from virtually any part of the system. The Committee did not want to develop an Australian and New Zealand standard which was not compatible with the ISO Standard, nor did it wish to impose an added level of complexity to the large number of existing and future applications which did not really require this new ISO proposal for use of the * key. Therefore, the Committee opted to allow both uses of the * key pending further industry development.


Good scripts are brief and clear.

Entering your script and structure into a flow charting program often shows up uneven distribution of system functions across branches of the menu structure. A balanced menu structure is easier for you and callers to understand and memorise.

There should be a person (preferably not one of the programmers or systems analysts) who is primarily responsible for writing the script for the system. Once this person has written wordings for all the messages in the system, it is often useful for them to work with one or two others to fine-tune difficult or uncomfortable wording.

The only way to get the wordings in scripts “just right” is to read and re-read the wording of the script as you write it and get people to speak phrases to you so you can hear what they sound like.

Ask colleagues in the IVR industry, as well as people with no IVR experience, to give you feedback on the script. It is not unusual to spend half an hour or more to get the wording for a tricky menu choice just right; but it is well worth it in the long-run.

There are often a few small changes in wording required during the recording stage, but these should be minimal. If something in the script just does not sound quite right to you, then the chances are that it won’t sound right to your callers either.

Remember that they will have to listen to the grating wording again and again each time they call the system.

The caller’s first interaction with the system is critical. Welcome the caller and put him or her at ease through clear uncomplicated instructions. The wording should not be too familiar at this point, and should convey a clear message to callers that they are interacting with a computer and not a human.

Try to pre-empt every conceivable thing a caller could do when first connecting to your system and be as supportive as possible. Remember that any call to your system could be from a caller using an IVR system for the very first time.

A few things to keep in mind when scripting include:

  • Keep messages brief and to the point, but avoid terseness

  • Terminology must be used in a consistent manner throughout the whole system. Always use the preferred terminology included in the Standard. E.g. Press for single key input, Enter for field input and so on.

  • Tell users early how to navigate around the system and how to get help, and ensure that navigation keys and help are available at all points in the system.

  • When a caller enters incorrect or unexpected responses, politely tell them this and provide more information about what is required by the system.

  • The system should always tell the caller if they are being transferred to an operator, and in most cases they should be given the choice of whether they want to talk to a human or not. Experience has shown that half the callers to an IVR system at one site hung up when a human answered during system down time. Clearly, once accustomed to the system, these callers wanted to interact with a machine rather than a human for these financial transactions.


If your system is to stand out from the pack, then it needs to have an identity of its own. Your callers need to enjoy interacting with it and should feel comfortable moving around within it.

The “voice” your system uses is the most influential aspect of the system, so you should take the time to get it just right. The “voice” can make or break a system as well as directly affecting the mood of your callers, and possibly their attitude toward your products and company.

To begin with, you need to decide whether you want your system to have a male or female “voice”. Who will your callers be? Which would they relate to best? Either way select people with strong clear voices that are not high-pitched. Experience on radio has demonstrated that people prefer voices which are not too deep or too high. The telephone system is optimised for average pitched voices, cutting off highs and lows from the signal.

In contrast to radio (which is still very male-dominated) many IVR systems now use contralto female announcers, possibly because female voices are often perceived to be more helpful and less authoritarian.

If your system includes a combination of synthetic speech and recorded speech, then a male voice will probably blend in better, since most of the better synthetic speech currently available is male, because the vocal characteristics of the male voice are less complex and therefore easier to synthesise. (Note this has changed since the time of writing)

How warm, friendly or relaxed sounding you want your system to be is the next decision. There is nothing worse than calling an IVR system where the “voice” is dull or bored sounding! The announcer needs to have some life and tonality in her or his voice. Tone and expression are the cues to your callers of what is expected of them, where they are in the system and how they are going.

If you want your users to move through the system promptly, then you should select an announcer who speaks confidently rather than an announcer who is too friendly and relaxed.

Give thought to an appropriate balance of friendliness and efficiency in the system’s “voice”. You want your system to be welcoming and supportive, but you also need to ensure that the caller immediately knows they are interacting with a computer and not a human. You don’t want your callers to have unrealistic expectations of the system, but you also don’t want to alienate them with a cold unfriendly voice.

The human voice is a very powerful tool. Your system’s “voice” should be able to encourage callers when they get lost in the system or when they are not responding quickly enough.

Tone and wording need to help a caller who has made an error remedy the situation without appearing to chastise or to punish the caller, and without discouraging them from using the system.


During initial system planning you should give as much thought as possible to potential future expansion. The types of menus, the way that you number choices and the terminology you use will all give you less or more potential to add in new modules at a later date.

If you need to totally redesign the system when making extensions to it, you will definitely alienate many existing callers even if you add a lot more functionality in the process. Your investment in training and documentation will be incorrect and callers will make more errors as they try using the system in the old ways.

Upgrades to system functions often require you to make a trade-off between re-design with a more logical overall structure or adding in new facilities in less logical places, but which don’t break with the existing menus and facilities. The latter approach is often more appropriate if you already have a large user base.

As has been discussed, the “voice” you select becomes an intrinsic part of the system and its user interface. Your callers will become accustomed to the attributes of the “voice” and will feel comfortable with it. If you need to update the system or make changes, it is critical that the system voice stays as constant as possible.

You should prepare a contract with your announcer so that he or she will be available for future enhancements.

A good example of this is the consistency and continuity of the “voice” of the Octel voicemail product, ASPEN.


Once your product has been released, aspects of the user interface are almost impossible to change without alienating or confusing users. Compared to other software products, IVR systems are a relatively new service on the market. Unstable IVR products will influence a caller’s attitudes to all systems. Unreliability and immaturity can lead to people postponing or abandoning dealings with such services.

You also need to find out how people respond to the “voice” of the system, the terminology used and how they navigate the menu structures. Unlike many software beta testing programs, you need to get a diverse selection of callers to try out the system before making it publicly available.

Be sure not only to select technically capable testers, as these are likely to be more familiar with IVR services and accordingly make fewer mistakes and do less “dumb” things.

Acceptance of your system by the broader base of callers will be determined largely by how gracefully your system handles the actions of the most naive user. As people working with and making a living from IVR services, the whole IVR industry needs to work to improve the overall quality and usefulness of such facilities. Presenting the public with a poorly tested system, or a system that is counter-intuitive to use, may do more harm than just loss of sales and customers; it may turn those callers away from using any IVR services at all.

All too often too much thought is given to what the system will do when presented with valid input, and too little thought given to what it should do if it does not get the tones and input it is expecting. Error handling is especially important when users are so diverse, and when the potential for misunderstanding is so great.

Too often, even well established IVR systems leave callers in limbo; not knowing what to do next, or in doubt about how to extract themselves from a mis-keyed password or account number.

Experienced users often know they have made an entry mistake, due to key-bounce or mis-keying, but are often left to guess whether they need to re-enter all their login details or just the last field.


IVR developments are having a dramatic impact on business, sales and communications. By incorporating these principles and by planning and testing your system thoroughly, you can establish a strong competitive advantage.


Tim Noonan +61 419 779 669 is the principal of Tim Noonan Consulting providing accessibility, voice, sound and telephony consulting services. Tim has been working in the IVR and auditory user interface field since the early 90’s He was also a member of the Australian IVR Standard Committee. We can work directly for the IVR customer, or we can work as part of a consortium with the primary system supplier or system designer.

Our IVR-related services include: initial system design and scoping; innovative think tank participation; intuitive and standards-compliant menu layout and scripting; selecting and directing voice-over announcers; writing user documentation; system testing; advising on IVR Standards compliance; as well as a range of independent quality review & reporting services.

Contact us or call Tim on 0419 779 669 to discuss your IVR system or project.

website by twpg