watching TVAIMS

This workshop aims to investigate how existing forms of social communication can be supported by, and new forms can be developed through, a TV-centred communication, with a view to enhancing the feeling of social belonging and togetherness between groups of people separated by space and time. More specifically, it aims to explore how family and friends, who are in separate locations, can share moments of fun together whilst playing social games, seeing and hearing each other whilst they laugh with and at each other, sharing instant impressions or recounting past memories, via a virtually directed TV-centred communication.

The workshop will have a particular bias on interaction and communication that is stimulated by or framed within social game-play. This should not be understood as merely attempting to transfer computer games onto the TV platform, by substituting the PC screen with that of the TV’s, but rather as considering the complete experience of playing a game together: the game is just the pretext for socializing together; it is the reactions, the jokes, the laughter, the arguments, the parallel conversations, the recounted memories, and so on, that are in focus.

Social game-play could range from being very structured and rich in audio-visual content, such as electronic games, supported by a game engine, through structured but simpler in terms of interfaces, such as board games, to less structured and requiring no interface, such as guessing games.

The term TV-centred communication suggests that the audio-visual communication between groups of people goes beyond the standard face-to-face model of the current video-conferencing systems, aspiring to reach the aesthetic quality of good TV narratives, through employing cinematic techniques in both the capturing and editing of the content. It subsumes both live (real-time) and catching-up (off-line) communication. For live communications the content processing delays should not influence the fluency of the interaction.

In terms of the live communication, the aim is to create a seamless connection between the participating groups/locations. Each location, normally accommodating a group of people, must get the best account of what happens in (all) the other locations. This can be achieved only of there is significant automation in terms of both content capture and editing. This function can be denoted as automatic or virtual directing, or interaction orchestration. Cues, such as laughter, movement, talking person, ought to be extracted automatically, possibly helped by sensory information and direct instructions from the participants. They determine, on the basis of some embedded interaction or communication intelligence – denoted here as orchestration intelligence – the cinematic techniques that are to be applied for both content capture (cameras and microphones) and content delivery (screens and speakers).

Real-time orchestration can have two functions: to simply, but accurately and effectively, support the recounting of events happening in other locations; and to moderate the interaction.

The off-line communication is about semi-automatically constructing TV narratives which best capture key experiences. The content may be captured semi-automatically, during a live interaction, as described above, or manually, during events that happen outside such interactions (for example, as video recording or still pictures). Once the content was captured, TV narratives, possibly interactive, will be assembled semi-automatically. Such narrations could then be incorporated in live communications.

Orchestration, in this context too, has the meaning of a virtual director and editor: it is about deciding which content to record and select, and subsequently, about how to edit it in meaningful (interactive) TV narratives. This route is founded in the ShapeShifting Media Technology [2].

Scope. The setting is that of a limited number of households, each containing a group of people who know each other and want to stay connected. The focus is on interaction via moving picture and sound that are automatically captured and automatically edited. As output devices, the central role of the TV screen is preserved, possibly accompanied by secondary screens, but other devices, such as surround sound systems and ambient devices, may also be included. For input, each location will have a number of cameras, arrays of microphones and possibly other types of input such as from game consoles and sensors. All the devices should be such that they can be integrated in a household environment.

Investigation Strands
The proposed investigation considers three perspectives.

  • Socio-Cognitive and Perceptualdefining facets of the experience of social belonging and togetherness. Which are the facets of the experience of being together between people sharing a physical space at the same time, and possibly being engaged in social games? Which aspects of the social communication foster that feeling? Which could be transferred and supported through TV-mediated communication? Which TV formats and cinematic techniques (visual: types of shots, camera movement, edits, effects, etc.; and aural: spatial placement of sound, voice synthesis, sound effects, etc.) could enhance the experience of being together in a video conference like experience, but between groups of people in more than two locations? Are there new ways of communication, not possible without the technological support, which could foster the feeling of togetherness? Can there new communication techniques be proposed for near real time communication, when the delays due to audio-visual content processing are too large to be unnoticed?

  • System Design specifying requirements for TV-centric systems that support social interaction. What related systems or prototypes already exist and how are they received by the end users? Which communication platforms do they employ? Which of their features could be adopted for the current aim? Are there new requirements/features refined through simulation and user evaluation? What is the economical feasibility of such propositions? What kinds of input and output devices are required, to implement this kind of social communication, and, particularly, emphasising TV based devices, what should their spatial layout be (in the household)?

  • Enabling Technologiesanalysing the capabilities of existing core technologies that could be employed in the development of TV centric systems for social interaction and, at the same time, refining new requirements for them. Are there existing representation schemes which could (partly) express the envisaged communication intelligence for interaction orchestration? What are the capabilities of their associated reasoning techniques? Are there examples of AI mechanisms that could understand social situations related to the ones described here? Are there models which can predict aspects of social interaction? Are there any relevant personalisation techniques? Which features can be extracted automatically from audio-visual streams, possibly helped by sensory input and/or game states, and with what efficiency (processing time) and accuracy (precision)? What description notations (ontologies) are there available? What multimedia notations for adaptive content exist? What are the capabilities of the low-delay decoding, transmission and encoding algorithms for both audio and video? What are the performances of the multimedia composition and rendering algorithms? How could game engines be integrated in TV based communication? How could different communication platforms be integrated?

The questions listed above are indicative: they guide but not restrict the proceedings of the workshop.


| Hosted by Eurescom