WOZ (Wizard of Oz) Interface


The WOZ interface provides the wizard, i.e., the human operator, with the tools required to start, step through and stop an EMOTE-like interaction from a remote network location.

The interface allows full control over the interaction, it does so by loading the task specification and populating the interface with the individual task steps, providing a method for restarting and proceeding through the steps.   The utterance set is also displayed on-screen, categorized by pedagogical strategy. The WOZ interface connects to the Map Application via Thalamus, allowing the task to be controlled in terms of step progression, tool opening and closing, and attempt feedback.

For the purposes of the EMOTE Wizard of Oz study, the Wizards’ interface provides a level of control over the task and the robot. Here, we assume that the Wizard sits in place of the autonomous mind, giving the system the decisional ability of a human so we can learn more about how to facilitate the interaction between a child and the robotic tutor.

Role and control

As a very first step in the design of the interface we needed to decide exactly what role the Wizard plays in the system and how much control he/she should have via the interface. Should the wizard be a controller, a moderator or a supervisor (Dow et al, 2005)?


If the wizard is a controller, then he/she is concerned with even the lowest level control of the robots’ behaviour. This is relatively simple to design as we can simply categorise each of the behaviours and lay them out so that the Wizard chooses each and every behaviour needed during the interaction. However, the problem we may face with this level of control is that the Wizard will become consumed with remembering and choosing the right behaviours and will pay less attention to what is actually happening within the task and during the interaction. With the Controller role, we would expect a certain temporal lag in-between actions performed by the user in the task and the Wizard being able to choose the correct behaviour, speech act and gaze direction, in addition to any computational and mechanical lag derived from the equipment.


At a mid-level, a Moderator is more concerned with ensuring a robots’ behaviour is suitable for the actions performed by the user in the task. Here, the robots’ behaviours are predefined, but still not fully automated. The Wizard is presented with a sub-set of behaviours and only has to choose one from that sub-set to moderate the behaviour selection. Using behavioural hypotheses for both the user and robot during each stage of the task (Green et al, 2004), we can highlight the behaviours which suit that particular stage of the task or actions selected by the user. The wizard is then able to select behaviours from the highlighted sub-set, whilst still being able to select something else if they wish to. This method would reduce some of the temporal lag experienced with the Controller.


At an even higher level, the Wizards’ role is to supervise the interaction. Here, the Wizard is abstracted away from the common low-level behavioural selection completely, but remains in control from a higher level. The wizard does still have several low-level controls for providing the user with feedback during task stages and for commenting on several basic unrelated task elements, yet common behavioural selection is automatically selected from a set based on several other higher level factors, such as the amount of help and scaffolding the Wizard feels is appropriate (i.e., based on the pedagogical strategy). Here, the Wizard can concentrate on making the interaction more about the learning content where the system is closest to that of the final EMOTE vision.

With the Supervisor, there is more time to facilitate the interaction and concentrate on what the user is doing and how best to support the user within the task, not on what should be said or gestured when answers are right or wrong (i.e., the Controller role). The interface can be focused more towards the pedagogical strategies that one should employ during the different stages of the task.



Once we had established that the wizard would need to have a mixture of control from both the moderator and supervisory roles; we could look into ways to design those controls so that they are intuitive and easy to use. This would be an iterative process using different design methodologies, where the Wizard was involved in each stage of the design.

Interface controls for the different aspects of the system (i.e., camera control, system control, competency grading, task script, utterances and feedback selection) were separated according to know interface design principles (Benyon, Turner and Turner, 2005)

Multi-View Camera Display

The interface can display a live feed from the three cameras. The frontal and lateral camera views of the participant are housed side-by-side in 320×240 containers in the top left hand corner of the interface, and below is a larger 640×480 container displaying the top-down view of the interactive screen with the participants hands clearly shown interacting with the task. This larger area allows the wizard to see the task in a fairly high level of detail.

System Control Panel

Positioned in the lower left-hand corner is a small control panel which allows the wizard to start and stop the system (in terms of video recording, data logging and task progression). Here, the wizard can enter the participants details (i.e. name and ID), which is then fed into the system so data logs can be saved along with the specific participant ID and the robot can address the participant by name.

Competency Grading Control

The interactive map task scenario performs some basic competency grading during the interaction. The architecture allows the interface to read in this information and to display it on-screen along with additional controls which allow the wizard to specific their own grading. Later analysis can be used to train the scenario’s grading system.

Task Script Control

The predefined task script is loaded directly into the interface upon initialisation. Displaying the different steps of the task script along with a clear indicator of the participants current position with the script allows the wizard to stay abreast of the task as it unfolds. Additional controls allows the wizard to repeat the question, retry the step and progress onto the next step.

Utterance and Feedback Selection Control

Predefined pedagogical strategies and feedback utterances are loaded directly into the interface upon initialisation. Here, the wizard has full control over which utterance is selected and sent onto the behaviour planner for processing. In their raw format utterances contain the actual speech act along with animation and gaze/glance tags. For fast viewing and selection by the wizard these tags are removed from the utterance before being displayed on-screen.

Tools Control

The participant can use tools to help them with the task (i.e., compass, measuring tool, map key). This control allows the wizard to show and hide the tools from this interface.

Download and Setup Instructions

The source code and binaries for the latest version of the WOZ Interface can be found here:

At present the WOZ interface is heavily reliant on the Thalamus framework. The WOZ interface requires the Map Activity, Thalamus and Skene. These must be running before using the WOZ Interface. Once you have these setup run the binary using the shortcut provided.


Dow et al, (2005). “Wizard of Oz support throughout an iterative design process,” Pervasive Computing, IEEE, vol.4, no.4, pp.18, 26, Oct.-Dec. 2005

Benyon D, Turner P, Turner S. (2005) Designing interactive systems. Harlow, England: Addison-Wesley; 2005.