Skene is a semi-autonomous behavior planner that translates high-level intentions originated at the decision-making level into a schedule of atomic behavior actions (e.g. speech, gazing, gesture) to be performed by the lower levels. Its development is still ongoing, and it was created with situated robots in mind that can also interact through multimedia/virtual interfaces (like a large touch-table). As such, it is the place where most of the other components meet in order to integrate behavior with the environment. Some of its features are:

  • Contain an explicit representation of the virtual and physical environment, by managing coordinates of relevant targets at which a robot can point or gaze at;
  • Autonomously perform contingent gazing behavior, such as gaze-aversion and establishing gaze (the opposite of aversion), using an internal gaze-state machine (GSM);
  • Gaze-tracking a target marked as a Person using the GSM;
  • Automatically gaze-track screen-clicks using the GSM (for multimedia application running on touch-tables);
  • Maintaining, managing and allowing other components to control utterance libraries.
  • Strip Slashes Toggle: When checked, skene will ignore all the tags present in the utterance.
  • Relative Speed Toggle: When enabled, all the utterances performed will be prefixed with /spd=X/, where X is the speed value. Using this value globally defines a speaking speed for all the utterances.This is a tag known only to the Acapela TTS system and currently doesn’t work with other TTS, like for example the Windows TTS.
  • Utterance Timeout: Detects when the robot is not speaking anymore due to a speech failure or network failure. This feature allows us to not stall the interaction in case NAO does not send a “SpeakEnd” message.
  • Backchanneling: After performing an utterance marked as “question”, skene waits for the users to say something, than it performs a random utterance in the backchannel category
  • Questions Wait: After performing an utterance marked as “question”, Skene waits for the users to say something. If the users doesn’t say anything, Skene will keep the interaction going after the Wait Time selected in the textbox
  • Tags and utterance test: Used for testing purposes, allows to send to NAO custom utterances using custom tags
  • Utterance queue status: Shows the current state of the utterances queue. Useful for debug purposes.


Scene utterances are the actual representations of the aforementioned intentions and were mostly inspired by the FML-BML pair used in virtual agents and the SAIBA model. They are composed of text, representing what the robot is to say, along with markups both for the TTS, and for behavior execution. The behavior markup can be used to control Gazing, Glancing, Pointing, Waving, Animating, Sound, Head-Nodding and even Application instructions. The following is an example of a Skene Utterance:

<GAZE(/currentPlayerRole/)>I’m unsure if it’s a good idea to <HEADNODNEGATIVE(2)>
build industries near <WAVE(throughMap)> the populated areas. <ANIMATE(gestureDichotomicLeft)> <GLANCE(Eco)> What do you think? <GAZE(clicks)>

The behaviors contained in the markup are non-blocking, meaning that while the speech is executed, the TTS engine sends events whenever it reaches a marked-up position, so that Skene can concurrently launch the execution of that mark-up behavior. While this seems like a pliable solution, it actually allows the further Realization components to perform their own resource management. Thus, if for example, the robot needs to gaze somewhere and perform an animation at the same time, the animation engine is be the one to either inhibit or blend the simultaneous forms of expression.


The Skene utterances we have used were developed mostly by well informed psychologists that take part in the development cycle as interaction designers. In order to facilitate such collaboration, Skene Utterance Libraries are stored and loaded directly as Microsoft Excel Open XML spreadsheets. Such feature hugely facilitates the interaction designers to collaborate between them and with the technical development team by authoring such files using online collaborative tools such as Google Spreadsheets.

How Skene was used in emote

Skene managed the utterances performed by the NAO robot and all the game aspects related to those behaviors (e.g. position of players, clicks’ location on the touchtable, etc.)

What, if any, other software/hardware is needed?

  • Microsoft Kinect (v1 or v2) for face-tracking, along with either EMOTE’s Perception Module, or using the Kinect v1 or v2 examples clients that are included in Thalamus.
  • Microphone to detect when someone is speaking, along with EMOTE’s Speaker Detection modules. A stereo microphone (or two microphones on a stereo interface) is required for a two-person scenario.


Tutorial info

A readme is included in the download.

Download Skene:


  • Ribeiro, T., Di Tullio, E., Alves-Oliveira, P., Paiva, A. (2014). From Thalamus to Skene: High-level behaviour planning and managing for mixed-reality characters.