Perception Module

The perception module supports multiple sensors such as the Microsoft Kinect V1 and V2, Q sensor, OKAO SDK for facial expressions and multiple video recorders in order to log and interpret the sensory information in a synchronised manner at a specific sampling rate set from the interface. Additionally, it captures HD (1080p) video directly from the Kinect V2 sensor which is also synchronised with the log files. It can be used to capture human robot interactions and general activities in a close proximity with the sensors. Discover more on the Perception Module page.

OKAO Module

The OKAO module works in real time and analyses images from a web camera and passes the information to the Perception module. Read more on the OKAO Module page.

OKAO Extractor

The OKAO extractor analyses images from video files and stores the information in log files for later offline analysis. The extracted information relate to the position of the user(s) in relation to the camera, their head rotation, eye gaze information, facial expressions and smile estimation. This module also calculates user’s gaze based on head and eyes direction and outputs the information to the perception module. This module was developed using C++. It has been used with multiple video files to extract facial features information for later analysis. Learn more about the OKAO Extractor here.

Interaction Management Module

We describe the workings of a stochastic Interaction Management (IM) module, show-casing a use-case where this IM has been implemented as a part of a robotic tutor who can sense the user’s affect and respond in an empathic manner. The IM is designed to be re-usable across interactive tasks, using a scripting language. We use an Engine-Script design approach, so that the IM can be used as part of the conversational agent as well as user simulations. Full details of the Interaction Management Module can be found here.

Interaction Analysis Module

The primary function of the IA module is to update the learner model with an estimate of the user’s valence and arousal during interaction with the EMOTE system. IA receives regular sensor updates from the Perception Module and sends regular affective updates to the Learner Model. The IA module needs to deal with both positive and negative affect. The first example shows a child displaying positive affect and in the second example another child displays negative affect. These can be considered as extreme examples of the affect expressed by children during interaction with the EMOTE system. Read more on the Interaction Analysis Module Page


Thalamus is a high-level integration framework aimed especially at developing interactive characters that interact both through virtual and physical components. It was developed in C# to accommodate social robots into such a framework, while remaining generic and flexible enough to also include virtual components such as multimedia applications or video games running on a touch table. Read more and find the download link on the Thalamus page.


Skene is a semi-autonomous behavior planner that translates high-level intentions originated at the decision-making level into a schedule of atomic behavior actions (e.g. speech, gazing, gesture) to be performed by the lower levels. Its development is still ongoing, and it was created with situated robots in mind that can also interact through multimedia/virtual interfaces (like a large touch-table). As such, it is the place where most of the other components meet in order to integrate behavior with the environment. More description is available on the Skene page.


NAOBridges allows a NAO robot to be a Thalamus Client and generally be a part of larger system. Considering the NAO robot contains a python environment for its scripts and the Thalamus framework was developed in C#, this module provides a bridge between them. It uses a XMLRPC connection to send and receive messages between NAO and other communicating entities of the system. Read more and download from the NAOBridges page.

Speaker Detector

The Speaker Detector module is a simple module that listens to a microphone input (mono or stereo) and detects when auditory activity has happened. It is mostly intended for situations in which two users interact side-by-side with a system, in order to detect which one of them is speaking. Learn more and download from the Speaker Detector page.

Speaker Rapport

The Speaker Rapport module provides some behaviour related to synchrony with the users of the system. It listens to messages that inform about user-speaking activity to generate gazing behaviours towards the currently speaking user, and also to auditory loudness provided in such messages, in order to instruct the speech-generation system to raise or lower its volume, in order to better match the loudness of the users while they speak. Download from the Speaker Rapport page. 

Learner Model

The learner model java is the database containing records. The learner model thalamus module is the interface to the Emote System as a whole. The learner model assesses the student’s task actions/performance, records information about learner state and saves a history. The learner model record messages sent from the S1 map applications and S2 Enercities activity, and the interaction Analysis module and it also combines them with the rest of the learner state to be forwarded to other modules. Additional information and download links can be found on the Learner Model Page.