In this essay, I present a history of the graphic adventure genre of video and computer games and its attempts at achieving a kind of cinematic realism through the registration of the body. In reviewing the history of this genre, I contextualize some early attempts to use motion capture and rotoscoping to incorporate human bodies into games, arguing that representation in games and other forms of digital media should be conceived not as deferring to the visual, but as reliant on the kinesthetic. While the visual presence of a human body may no longer be a coherent source of any link between a representation and physical reality, motion brings together digital images with the reality inscribed into media. This involves numerous questions about realism, indexicality, and affect, which I aim to intertwine and unfold below. In making this argument, I demonstrate three things. First, digital images are condensations of specific—if multiple—bodies that persist as representations that have some link with the physical world.1 Second, realism in games has long relied on the inscription of embodied motion. And third, games require an expanded definition of the index that stresses motion rather than a privileging of the trace—“trace,” here, referring to the quality identified by André Bazin that sees in photography and film the capacity to physically “embalm time” and objectively capture reality through a photographic image, a belief that has often been used to differentiate digital representations from analog images.2 Instead of saying that digital images lack the physical trace of reality in their representations, I claim there is a specific form of gestural inscription made with digital animation and motion capture—a kinesthetic index—which cannot be framed, as games often are, as a playable simulation of reality.
The history of games—because of the technical limitations of personal computers and gaming consoles until the last decade or so—stresses more abstract forms of representation rather than realistic imagery. Yet, while technical limits of games in the past would seem to privilege abstraction, game history has nonetheless perpetuated a drive towards more realist forms of representation, perhaps because of the emphasis on qualities of “immersion” that emerge from emotional connections to game characters and a game world. Mark J. P. Wolf has framed this tendency through Wilhelm Worringer’s 1908 Abstraction and Empathy.3 Abstraction, for Worringer, is a reduction that leads to the transcendental, though one that is related to an inner turmoil and a distancing from the world at large. Empathy, on the other hand, involves the projection of similarity onto another, what we would now conceive of as an affective connection that bridges the divide between subject and object. To use the German terms on which Worringer relied, empathy is a translation of Einfühlung—literally, “in-feeling.”4 Worringer’s book, however, was a rejection of a now neglected strain of German aesthetic theory that emphasized this in-feeling as an absolute aesthetic value, instead arguing for the necessity of abstract forms that characterized the emergence of modernism. Rather than celebrating an affective linkage between self and other, Worringer was arguing for the necessity of forms that were alienating, in ways that resonate with similar contemporary critiques of empathy from phenomenological philosophy, and with other defenses of modernist aesthetics such as those of Theodor Adorno.5
Yet, the abstractions of modernism were often attempts to inscribe and represent motion, not efforts to reduce empirical experience to phenomenal essences, or to produce alienating works of art that provoke self-reflexive knowledge.6 Wolf claims that abstraction can provide an important, if neglected direction for game design, given how technical processes of abstraction are intrinsic to digital media. Here, I want to suggest that any assumed realism in games likewise relies on practices in which motion is abstracted out of human bodies. These methods permit forms of representational realism, I claim, because kinesthetic indices serve as the grounds from which digital animation in games represent “real bodies” via abstracted movements rather than visual verisimilitude. Or, rather than a clear set of binaries between abstraction and (realist) empathy, between simulation and representation, between material index and virtual image, the abstract indexing of motion questions the ability to make these differentiations when discussing games and digital media. At the same time, this fact of digital images provokes political questions when bodies are reduced to abstractions of motion that can be taken out of specific bodies and placed into others.
Before I turn to discuss the history of graphic adventures, I want to elaborate these claims by looking at a triptych of synchronized faces, all of which speak with the same voice and move with the same facial gestures (figure 1). All three are speaking Japanese. On the left is a three-dimensional, digital model. She appears to be Caucasian, with pale skin and brown hair. On the right we see another incarnation of the same model, this time from a 2/3 view. One can make out blemishes on her skin, the folds in her ears, the barrettes holding her hair back into a loose ponytail. In the middle we see the face of a different woman, who has been identified as Japanese. She is not a digital model and is shot with low-grade video. It’s difficult to observe specific details about her appearance, as her face is covered with black dots and lines. These markings—along with the harness visible at the top of the frame—are signs we’re observing a specific version of motion capture called “performance capture,” which records not the more obvious movements of the body, but fine details that differentiate discrete facial expressions thought to convey emotional states.
Performance capture is employed to add a level of realism in digital animation and video games, using human faces to reproduce subtle gestures of emotion rarely represented in the history of these media. The woman in the middle of this triptych is the source of the voice and movements in all three images. She cycles through a series of emotions with different intensities: neutrality, happiness, and then anger. Instantaneously, the movements of her face are encoded and replicated in the models flanking her visage. And yet, many of the specific markers of this woman’s identity are removed in the transition from human face to digital model. In the process of performance capture, gestures associated with Japanese speech are no longer anchored to a body marked as Japanese, but instead copied in a digital model of a completely different, seemingly white body.
The video I have been describing has the banal title of “Japanese Example,” a tech demo from the motion capture and animation companies Cubic Motion and 3Lateral.7 These companies have worked on facial animation for many notable video games, including Grand Theft Auto V (Rockstar Games, 2013), Batman: Arkham Knight (Warner Bros., 2015), Call of Duty: Advanced Warfare (Activision, 2014), and Ryse: Son of Rome (Microsoft, 2013), pioneering a method of real-time performance capture that records the gestures of actors and matches them with pre-rendered, digitally scanned models, through a schematic based on psychologists Paul Ekman and Wallace Friesen’s 1978 “Facial Action Coding System.”8
Cubic Motion and 3Lateral present this demo as about problems of inclusion, suggesting the necessity of broadening realist representational techniques to accommodate other languages and markets beyond the hegemony of Anglophone countries.9 Even though realism in the audiovisual arts has been less about media’s ability to represent reality than about spectacle, technical virtuosity, and the evocation of the senses,10 3Lateral and Cubic Motion nonetheless position these digital representations as a mimetic copy of human bodies, thus capable of evoking emotion in game players and spectators.11 Digital animation is able to evoke emotion, they suggest, only because of the accuracy of the embodied emotions inscribed into digital models, reproducing an understanding of affect as a mirrored transmission from one body to another, processed by the brain’s empathy circuit rather than evoked by, say, the narrative capacity of digital media—a similar claim advanced by those Worringer critiques on the aesthetics of empathy. Visual similarity permits us to feel-into another without conscious intention. And yet, in spite of these pretenses of inclusion through technical virtuosity, this example erases the visual presence of the actor, abstracting out her gestures and speech to place them into another body—one that appears to be white. While many character models created through performance capture do, in fact, appear as the actor on which they are based, in this example this is not the case. Instead, we get a white body that can speak Japanese with facial gestures that, assumedly, come with the native ability to speak in that language.
Representing human speech does not rely on words alone, but on the embodied, affective dimensions of language conveyed through tone and motion.12 When games are translated, these gestural affects are lost. With their demo, Cubic Motion and 3Lateral envision a future where games designed in one country (and language) can be localized in another through the hiring of performance capture actors that speak a language different from that of the game’s original design. Instead of dubbing voices over predetermined animations, in which voices and faces remain disjointed and detached, character models initially designed for one audience can be remade to speak and gesture in a new language, without any obvious disjuncture between animated facial expression and dialogue.
This technique appears to push digital animation in two contradictory directions. First, towards an increased cultural specificity. Games are remade for particular national or cultural contexts through the hiring of unique actors capable of speaking and performing gestures that would otherwise escape the translations required in moving a game from one context to another. But, second, towards a greater sense of universality. Singular character models can be remade in any context through a set of tools that assume all human beings share the same set of emotions and facial gestures at an anatomical and neurocognitive level—emotions can be digitally modelled through a set of discrete facial poses that can be incorporated and reproduced by the software techniques used in digital animation. This fragments the bodies of actors, placing particular gestural, kinesthetic traces linked to specific, physical bodies into generic, “universal” models—while employing a normative model of facial expression that assumes, at some level, all bodies express and interpret facial expression equivalently. Consequently, universality and specificity are held together in an unlikely dialectic, in which the animated model serves to obscure the contingency of its production in the name of globalized entertainment.
This use of bodies has precedents in the history of the cinematic close-up. According to Mary Ann Doane, the face in close-up “is a paradox in that it is simultaneously the locus of particularity, uniqueness, individuality, and contingency and a generality and universality of expressions available to all—a book that can be easily read by anyone.”13 While a specific face is unique, and the movements of a face speak to its individual specificity, facial gestures are also a “universal” assumed in the history of cinema. Yet, attempts to use the close-up for the creation of a universal gestural language—through visual techniques that are supposedly egalitarian—erases class and ethnicity from representation. These techniques are reliant on an imagined spectator who, while supposedly neutral, is coded as middle-class, white, male, and heterosexual. Doane implies that the close-up requires overlooking the contradictions revealed when bodies are reductively inscribed into formal logics that strive for a simultaneously particular and general universality. I also want to suggest that this privileging of the face as a locus of emotional relation negates the technical processes through which bodies are integrated, stored, and combined in contemporary media. The image becomes a fetishistic locus of encounter, obscuring the mechanisms of inscription that enable bodies to be placed on a screen in the first place.
With the “Japanese Example” we have an instance in which a face—in its particularity—can be remade in favor of seemingly universal gestures, digitally modeled and recombined with others. But this does not suggest that, as Beth Coleman argues, with digital animation “one loses body (face) as an index of the real”—at least not quite. Coleman suggests that as bodies and identities blur into one another with digital animation, then digital, networked forms of embodiment inherently exceed the physicality of an individual body.14 I agree with Coleman’s understanding of embodiment and digital media, but I question the loss of the body and face as an index of the real. Rather than a loss of the body, digital media inscribe countless traces of a body—be they subtle movements of the mouth and nose, the grain of the voice, or any other nonlinguistic marker—which can then be abstracted out and placed into other bodies. These traces, while making difficult a clear relationship between an individual body and its representation on screen, nonetheless refer back to the “reality” of an original and its assumed bio-cognitive capacities for movement, even though the extraction of gesture may leave the rest of that body behind. In digital animation, the gestural traces of multiple bodies combine to produce one—a body that is, in the case of video games, often an avatar designed to be controlled by another. The bodies of digital animation and motion capture are a composite of a number of different bodies and faces, fragmented and rearranged. In games, these fragmented, assembled bodies include a vast array of elements that emerge from any number of different actors, software designers, digital artists and designers, and the players themselves. Thus, the body of digital animation and motion capture may appear to be a “false,” mutable digital image, but it nonetheless contains countless traces of originary human bodies in its movements, questioning critiques of representation that defer to visual media’s ability to mirror or distort “reality.”
Techniques of motion capture rely on a system of what Nicolás Salazar Sutil terms “kinetic formalism,” in which rules—sometimes arbitrary, sometimes abstract, but always with a reference to the embodied—create a “language” of bodily poses, transitions, and motions, even though this language may not have a clear relation to linguistic meaning.15 It is incorrect to suggest that digital media have invented these processes of kinetic formalization, as they have existed throughout the history of dance, performance, and cinema. When I use the word “gesture” throughout this essay, I am referring to processes of kinetic formalization through which a body’s motion is coded into an array of discrete, isolated, abstracted movements which obey a specific logic and system of organization, though this logic and system may or may not correspond to conscious or named meanings intentionally communicated between individuals.16
For Giorgio Agamben, gesture demonstrates the mediality of human life because it exceeds language. Gesture demonstrates the inherent mediation of language and, in itself, is a “communication of a communicability” without a particular end or goal. Gesture, then, performs the possibility of communication without actually communicating anything, as it inherently exceeds or transcends language, which leads him to suggest that gesture opens the sphere of the political because it cannot be reduced to a particular “end.”17 But Agamben’s claims would require gesture to remain outside of any particular formalization, beyond any particular mediation. This is, quite simply, not the case in motion capture, and thus a different sense of “politics” is at stake when it comes to gesture than the one implied by Agamben. Examining the particularities of how a human body is integrated into specific kinetic systems leads us to question the politics of representation and how representation can differentially organize the capacities of bodies. This “representation,” however, is less about the relationship between spectators and images than it is about the physical techniques that inscribe bodies and place them into relation with other bodies, other inscriptions, and other technologies—and the techniques that refuse the inscription of specific bodies, specific acts, and specific relations.
Thus, I suggest that techniques of performance capture as employed in games production—and motion capture more broadly—require a change in how we understand representation in contemporary media, one from representation as a kind of image that reproduces a sense of visual reality to a non-visual index of motion—a kinesthetic index—that inscribes reality and enables its use and reperformance over time. The techniques and tools of production—which today are largely encompassed by digital editing and rendering software—must be placed in the foreground of the analysis of contemporary media culture.18
In the rest of this essay, I discuss representation in digital media as reliant on techniques of gestural inscription through the examination of a transitional moment in the history of digital animation: the use of motion capture and video in the graphic adventure gaming genre.19 Graphic adventures were one of the most popular genres of computer games in the 1980s and 90s, and generally focused on narrative, puzzles, and exploration rather than action-based combat. I want to approach the kinesthetic index historically, and, following my opening example, I want to highlight the necessity of thinking digital visual culture in relation to the history of games. The history of graphic adventures is not only important for its relation to motion capture; it charts a moment in the history of media in which a primary referent for understanding games changed from “interactive literature” to “interactive cinema.”20 While graphic adventures are often thought to have died as a genre in the late 1990s and early 2000s—an assumption that is misleading given the persistence of forms from these games after their “death” and their clear return in recent years as a popular genre for independent game designers—the metaphor of “interactive cinema” persists in popular discussions of many big-budget games and the future of interactive entertainment. These techniques are often mobilized with a naïve drive towards “the real”—as if digital artistry can be evaluated through verisimilitude. But, I argue, rather than a loss of an indexical real, as is so often lamented with digital images, this suggests the emergence of a different sense of realism than that of the photographic index. This so-called realism depends the reperformance of “real bodies” through the technical extraction and encoding of gesture, and only indirectly appeals to the visual.
The adventure game is frequently associated with the emergence of interactive fiction and electronic literature, along with being a foundational genre in the wider history of video games, responsible for the view that games should be thought of as playable, interactive, and immersive narratives. Programmers working at MIT and other ARPANET-connected institutions developed games like Adventure (Will Crowther, 1976) and Zork (Infocom, 1980) not merely for their own amusement, but as experiments derived from early artificial intelligence combined with an attempt to invent digital versions of the popular role-playing game Dungeons and Dragons. At the same time, these programmers were perpetuating a tradition in the history of literature that positioned the text as an open, procedural “literary machine” and invented the idea of an online, playable, digital world so influential for future understandings of both games and so-called “cyberspace.”21 The common name attributed to these games today—“text adventure”—points directly to the way the game itself inscribes information, bodies, and relation through a textual interface.
The technical ability of a medium to inscribe, to write something down, to store it—and not store something else—enables specific forms of sensory data to maintain a certain presence over time.22 Understanding the ability of technology to inscribe attunes us both to the mediation of our own history and to the mediation of bodies’ interactions. The materiality of technology constrains and enables what can be said through the markings of inscription, which, in the process of writing things down, produces the possibilities for “bodies,” their differences, their relations, and, crucially, how they come to matter to and for each other.23 Inscription, thus, refers to a technical-discursive practice that performs an act of differentiation, shaping bodies and how they come to relate—one that is about grasping (and containing) the ever-changing, performative historical processes that produce “material” reality and relations.
In text adventures, relations are mediated through simple textual descriptions and a verb-object input mechanism called a parser. Thus, my “body” in the game is almost completely undefined. The “world” itself is only depicted in a rudimentary way. Text reduces gesture to linguistic description and verbs. “Embodiment” in these games is consequentially open-ended (if mostly invisible) aside from the registration of bodies and gesture through language (and, specifically, typed language) recognized by a computer program. That human bodies are not inscribed into these worlds very fully parallels many of the optimistic claims about identity online in early theorization of internet sociality. The flexible and fluid identities “revealed” by early identity experiments online are not an essence of human identity disclosed through technology, but a material effect of technological inscription.24 A later game in the Zork series that uses graphics and video, Zork: Grand Inquisitor (Activision, 1997), jokingly refers back towards the ambiguities in these early text adventures when it refers to its main character as “AFGNCAAP,” an acronym for “Ageless, Faceless, Gender-Neutral, Culturally Ambiguous Adventure Person,” a trope that persists in many first-person games that neither show the main character on screen nor permit that character to speak.
Roberta Williams’ Mystery House (Sierra On-Line, 1980) is considered to be the first graphic adventure game, taking the text-based world of games like Zork and adding rudimentary, two-dimensional, vector-based graphics as illustrations to accompany the basic descriptions of text adventures.25 Even with the addition of visuals, interaction in Mystery House was still reliant on textual input. Images were mostly static, and the game’s protagonist did not appear on screen. Several years later, Williams’ King’s Quest (Sierra On-Line, 1984) added the appearance of an animated character on screen controlled by the keyboard’s directional arrows, doing away with the text-based input of cardinal directions for movement. Textual description was not given automatically, but rather invoked with the command of “look” or “look at.” Graphics were still based in programmed vectors, if filled in with solid colors, which, while often based on drawings, relied on digital code for the in-game, real-time rendering of images.
King’s Quest, along with most of the graphic adventures released by the publisher Sierra On-Line between 1984 and 1989, was based on a game engine known as the “Adventure Game Interpreter,” or AGI. A game engine is a (usually proprietary) software-based infrastructure that defines numerous elements in a range of games, creating a technical (and economic) genealogy out of the limits and conditions of how previous games have been programed.26 Engines delineate forms of interaction and are thus central to how game genres are defined—along with defining the specific ways bodies come to matter in games.27 Later games made with the AGI engine relied on variations of the same visuals and forms of interaction as King’s Quest, built into the software as they were. The formal similarity between these games was also conveyed through their titles, many of which were based on the “quest” theme, including Space Quest (Mark Crowe and Scott Murphy, Sierra On-Line, 1986) and Police Quest: In Pursuit of the Death Angel (Jim Walls, Sierra On-Line, 1987). While these games made reference to established filmic and literary genres—fantasy, science fiction, and police procedurals, respectively—the graphic adventure genre was defined by the engine and its effect on interaction and gameplay. While their “worlds” were visual, interaction was still primarily through text, both in written description of spaces that accompany visuals, and also through the persistence of a modified text parser. At the same time, in King’s Quest and most other AGI games, it was now clear that the player was controlling a character. The subject of the game wasn’t addressed in the same way, and the player wasn’t located in the game in precisely the same manner (the main character of the game usually had their own name, and the player was often awkwardly addressed both as the character and through the second-person “you”). Space, while visual, was still organized as a series of rooms—not necessarily connected by cardinal directions, but by the borders of the screen as the character was moved by the arrow keys throughout the game world.
While Sierra’s games mostly retained the text-based parser handed down from the legacy of text adventures, other developers in the 1980s experimented with graphical forms of interaction. This included ICOM Simulations’ “MacVenture” engine-based games, including Déjà Vu (1985) and Shadowgate (1987), designed for the Apple Macintosh and its mouse-based graphical user interface, and Maniac Mansion (1987), produced by George Lucas’s Lucasfilm Games (later renamed LucasArts). Taking advantage of the popularity of the mouse as an input device—which, although it had existed for decades, did not achieve broad acceptance until Apple’s release of the Macintosh in 1984—these games completely removed the text parser and text-based input through a keyboard. Rather than type commands into the game through a verb-object textual parser, these games had all possible verbs displayed on screen. Input for the game would be performed by clicking on a verb and then clicking elsewhere. This subset of graphic adventures is often referred to as “point-and-click adventures,” again referring directly to the technical form of interaction on which the game relies.
The turn to graphical, point-and-click parsers placed the body back into the game through visual representation—not through the representation of verbs and specific actions, however, but through iconic representations of body parts. LucasArts’ engine (named the “Script Creation Utility for Maniac Mansion,” or SCUMM), along with “Sierra’s Creative Interpreter” (or SCI), the engine Sierra began using in 1988, would gradually replace the text parser and clickable verbs with these icons (figure 2). The player interacts with King’s Quest V: Absence Makes the Heart Go Yonder! (Roberta Williams, Sierra On-Line, 1990) through icons at the top of the screen that represent various general actions, including walking and speech, but also “eye” and “hand,” which can be used for any action that can be performed by the eye and hand. Walking could be said to be represented as “feet” or “legs,” and speech as “mouth,” but few games allowed actions for these body parts other than walking or speech. The hand icon, for instance, could be used for picking up an object, touching something, or opening something.28
Sierra, as well, replaced the vector-based line drawings found in their ACI-based games with bitmapped images based on digitally scanned versions of drawings, paintings, and watercolors. While this would seem to be a minor advance in technical achievement, it presents the game world as one derived from hand-drawn illustration rather than images overtly produced by a computer. The difference in how these images were stored on disk likewise reflects this shift. Early ACI-based images were computer code that rendered the image anew each time the program was run. SCI-based backgrounds were stored as pre-rendered bitmapped images accessed by the game. In other words, ACI games lacked any sort of indexicality and were produced completely as an effect of a computer program, while SCI games negotiated their digital images as a kind of index based on encoded, hand-drawn artwork. Along with the transformation in the gaming interface at around the same time, these changes in graphic adventures illustrate numerous ways designers were attempting to bring the human body back into games, primarily through images that appeared to be drawn by a human rather than rendered by a machine, and through a new kind of interface that was based in the visual representation of embodied gestures rather than verbs.
Around 1990, adventure games, in transitioning from text to a visual and iconic language, demonstrate a complete change in relationship between how bodies were represented in (and inscribed into) games. Actions began to be conceived of as a generic set of gestures of specific body parts, performed by characters on screen. While early text adventures were conceived of as “interactive literature,” these games began to be described as an emerging form of “interactive cinema,” exhibited through increasing attempts to use technology to encode “real bodies” into games. The emergence of the CD-ROM, for instance, allowed games to store an increasing amount of data that were often intended to bring human bodies into games to appeal to a kind of representational realism.29
This happened initially through the use of voice acting, along with rotoscoped motion capture. While most Sierra games featuring voice acting tended to depend on the talents of the people who happened to work for Sierra at the time, Jane Jensen’s Gabriel Knight: Sins of the Fathers (Sierra On-Line, 1993), a New Orleans-set Southern Gothic mystery, was noted for a number of celebrity voice actors, featuring Tim Curry, Mark Hamill, Leah Remini, and Michael Dorn. The promotional materials for Gabriel Knight, in part because of its actors, repeatedly referred to the game as an “interactive movie,” though it was otherwise formally indistinguishable from prior examples of the graphic adventure genre. Instead of relying on digital scans of hand-drawn animation for character models, Gabriel Knight used rudimentary blue-screen motion capture techniques to film human motion, the data from which were rotoscoped to create character animations.30
Initially, the bodies in these animation-based graphic adventures only encoded the voices of actors, removing the other physical elements of their body. Other gestures of the body entered into the game as a series of discrete character animations (called “sprites”) that represented every possible action a character could perform, played through icons that represented body parts. These sprites were potentially derived from rotoscoped movements and drawings of faces, linked to voices that may have little to do with the characters supposedly represented. In Gabriel Knight, Tim Curry, as the game’s titular character, plays a man from New Orleans, and Leah Remini portrays an Asian-American character named Grace Nakamura, negotiating the voice of an actor and negating the rest of their body based on assumptions about the relative value of presence and the technical ability to inscribe the body. As is the case with the “Japanese Example,” we once again see the removal of embodied markers of identity through the extraction of gesture, which are then recombined with other bodies as a result of how various technical strategies inscribe bodies as information. Curry, for instance, looks nothing like Gabriel Knight, and was not the source for the rotoscoped motion of his character. But his voice (and name) lent a value to the game, especially when combined with the movements of another body that visually represented what designers intended for the character to look like (if not sound like). I want to suggest that this value emerges from how the presence of Curry’s voice is, for Sierra, something that points to “the real” in a way that moves the game from graphic adventure to film and its supposed ability to index the real—though, clearly, this turn to the “real” (and the index) is not without its contradictions.
The shift from early graphic adventures to so-called interactive cinema did not depend on a singular model with a uniform history—nor did it abandon many of the more absurd, cartoonish conventions of the genre in the name of cinematic realism. Other graphic adventures at around the same time began to use (still) digital photographs instead of rotoscoped drawings for character images. 1993 was also the year the popular puzzle-adventures Myst (Rand and Robyn Miller) and The 7th Guest (Virgin Interactive) were released, both of which featured video of actors. Sierra would eventually continue in their deferral to cinematic realism, incorporating filmed actors and full motion video into its games, first with Roberta Williams’ horror game Phantasmagoria (1995), and then with The Beast Within: A Gabriel Knight Mystery (Jane Jensen, 1995), commonly referred to as Gabriel Knight 2, equating the representation of reality with the encoding of human bodies.
At this point I want to expand on questions of realism in games, along with assumptions about indexicality (or its absence) in digital media. My critique of realism here is mostly indebted to its use in media studies, where it tends to refer to the ability of representation to mirror reality, conflating “realistic” with realist.31 Realism in games, more than representational verisimilitude, is often assumed to emerge from how the player invests their body affectively into the game itself, understanding “affect” to be a kind of non-linguistic relation. Affect exists as a specific response to the situated knowledge of the player as the social context of the game enables them to empathize with whatever is on screen. As is the case with Worringer’s critique of empathy, this tacitly assumes that inscriptions of “real bodies” justify a kind of affective bonding between player and character—which, when it comes to the increasingly photo-realistic images of faces derived via performance capture, is assumed to happen through the visual, neurocognitive mimesis that supposedly occurs from the nonconscious “recognition” of specific kinds of facial emotions represented in games from various psychological models. Realism, then, defers to the affective as a kind of empathetic identification, of “feeling-into” what is observed.32
Fredric Jameson suggests that realism, modernism, and postmodernism are aesthetic modes that exist in dialectical tension. And while realism is an often dismissed artistic style, it nonetheless can be defined as a contradictory form in which narrative storytelling is held in uneasy unity with a drive towards the increasing presence of affect, which Jameson defines as embodied sensations that escape the linguistic naming of emotions or feelings.33 This is not precisely the same thing as an empathetic feeling-into. Rather, Jameson’s affect refers to the description or presentation of sensory experience: representation as sense-data to be inhabited by the body.34 This understanding of affect is, perhaps, more productive than the quasi-psychological understanding of affect given in the digital modelling of faces, in part because it highlights the contradictions that exist in any understanding of realism or “realisticness.” Affect is, after all, always articulated in specific narrative contexts.35 Thus, there is a double, contradictory movement that defines realism: the drive to tell a story or provide a coherent narrative (which obviously violates any ability to represent “the real” through the necessity of narrative convention), but to fill that story with non-narrative (affective) elements of sensory description that likewise call into question the ability of narrative to approach “the real.” Realism points towards “real bodies” and “real life,” though it only points towards them as an asymptote. The “real” is something that will never arrive in realism, though the desire for the real is registered, in part, through the technical registration of the body’s affective materiality.
So, there is an element of the ideological in any understanding of realism as an aesthetic mode or style, often imposed by the requirements of narrative, even if the maintenance of this ideology is perpetually undermined by the deferral to “real” sensory data. And this carries with it—importantly for my claims about inscription—the necessity to conceal the technical means that underpin representation for any form of media to be taken as real.36 When it comes to film, Jameson suggests, realism emerges partially (but not completely) as a result of the photographic index.37 But this registration of bodies, left without the ideological direction of narrative, transforms realism into modernism and moves away from its specificity as an aesthetic form. At the same time, this “realism” requires the materiality of the photographic apparatus (as something that technically produces “the real” as a selection intrinsically divorced from “reality”) to be obscured and rendered invisible. The tension that holds together “realism” as a style simultaneously requires the affective presence of bodies and sensation but denies just how those affective traces were produced in the first place.38
Thus, there is an intrinsic assumption about the inscription of the body necessitated by realism, one that requires the erasure of the material, technical means of inscription. The sense data recorded by the medium cannot acknowledge the presence of the medium doing the recording. It must appear as an unmediated trace of reality rather than a technically encoded inscription. Realism makes affect “autonomous,” denying that there is a narrative and technical mobilization of “affect” that can only happen through the specific mechanisms that inscribe the body. This happens to challenge a number of claims about the material specificity of film as a medium particularly attuned to “the real,” found in discussions of the indexicality of film and photography, and the supposed transformation of the index with the digital. Film’s claim to “the real” is often thought a result of its photographic ontology, where, according to D. N. Rodowick, the “photograph is a receptive substance literally etched or sculpted by light forming a mold of the object’s reflected image.”39 A digital image, Rodowick claims, removes reference to physical reality and replaces it with the “mental or psychological reality” that comes from “modeling algorithms and the cognitive schema on which they are based.” This “realism retreats from the physical world, placing its bets on imaginative worlds—in other words, a projection of mind into image that conflates mental images with perceptually real events.”40
Yet, this loss of indexical “reality” actually reveals a return to the narrative and affective realism discussed by Jameson. The requirements of realism—the dialectical conjunction of narrative and affect—are only satisfied when physical reality is seemingly replaced with the mental and psychological reality that narrates the body. Film and photography can only approach realism if the medium is placed under erasure in favor of narrative schema that organize these traces. Thus, the articulation of indexicality with “realism,” or even with “the real,” is something of a problem.
Theories of the photographic index defer to Charles Sanders Peirce and his description of indexical, iconic, and symbolic signs. If the icon is a resemblance and the symbol is an arbitrary convention, then the index is defined in at least two ways that appear to be contradictory: as trace and as deixis. The trace is a physical remainder or residue left behind that persists in time. Peirce, however, also stresses another kind of index: deixis—the index of the pointing finger, of the words “there” and “this.” As Mary Ann Doane argues, “Of these two dimensions of the index [trace and deixis] emphasized by Peirce, the latter is frequently forgotten in the drive to ground the photochemical index as trace. Only the first definition—the index as imprint or trace (preeminently the footprint)—seems to correspond to the cinematic image.”41 It is in the dialectic between index as trace and index as deixis that one finds “an almost theological faith or certitude in the image.”42 This faith emerges from the physicality of touch: the index as the material excess of contact between inscription and medium that binds both trace and deixis.
It is not merely the “real” embodied by the physical image that matters, but the medium-specific, tactile techniques of inscription that lend “truth” to the persistence of the trace. But this would suggest that the “realism” of the index—if it supposedly is about the inscriptive presence of the trace—actually obscures the process through which an image is made into an index. Because the index must refer back to the tactile, technical process of its construction, then it may signify “presence,” but never “reality” unmediated beyond specific media and their capacities to inscribe information. “The index,” Doane claims, thus possesses the power to verify existence “by virtue of its privileging of contact, of touch, of a physical connection. The digital can make no such claim and, in fact, is defined as its negation.”43 But digital media nonetheless rely on kinds of deictic indices—it’s just that they inscribe the body in ways that are neither inherently visual nor directly perceptible to human sensation. There is a digital “touch,” even though it may be ontologically distinct from any sensible human touch. While some indexical relationships are broken and lost in the transition to digital media, others are revealed or produced anew in accordance with the specific material constraints of the medium.44 Every indexical trace is an inscription that emerges as a record of contact, and just because something is ontologically different does not inherently mean that the link is broken or forgotten.45
Digital technologies of inscription do not forget the body, nor do they exist solely at the level of digital, numerical abstraction. Rather, elements of a body “are mapped out and injected onto another ontology of the body… there is no forgetting of the original, there is no cutoff point after which we no longer connect back to bodiliness.”46 Or, the indexical in digital media may never present itself visually, but emerges through the specific technical means for inscription and (re)performance. With digital animation, these means are less about visual appearances than about the inscription of gestures that can be removed and then placed into another body and then narrated to make that body “real.” The indices through which the body returns in digital media are those of voice and motion, registering the presence of sound and gesture not as an undifferentiated, informational universal that can be perpetuated in multiple forms of media and storage without difficulty, but as a specific way the body is fragmented and recombined as a result of technical processes that convert the human body into data.
The historical changes in the video games discussed above are partially the result of a desire to represent reality. Even beyond the transition from text to graphics, game designers continued to use video, sound, and motion capture to register and inscribe the human body, increasing the presence of affect (as descriptive sense-data) in games that were otherwise based on highly clichéd narratives derived from tropes of film and literary genres. This was most clearly seen in games produced in the mid-90s that relied on video, such as Phantasmagoria and Gabriel Knight 2. While the frame of reference for these games was often interactive cinema, using video to mimic the visual indexicality of the photographic image, they failed to coherently reproduce “reality.”
Phantasmagoria, a gothic horror game, was thought to have “paved the way for advances in the interactive potential of horror games by abandoning…cartoonish environments” and incorporating techniques directly from cinema.47 Character models were based on the filming of actors who were then composited into a pre-rendered 3D environment (figure 3). Phantasmagoria, based on the SCI engine, was designed to appeal to a broad audience, reducing the interface for the game even more than the previous icon-based system used by Sierra. The cursor became a generic “action” that would turn red if the character was able to do something (an indexical interface rather than an iconic one—a cursor that merely points “there!” to indicate an action, rather than an icon that represents a body part). Sierra’s other full motion video games performed the same reduction in interface. This was partially an effect of the problems with and cost of integrating video into games—Phantasmagoria went massively over budget because of the difficulty in matching filmed movements with completely digitally rendered environments, which until then had never been performed at the level the game’s production demanded. Budgets were cut in subsequent experiments with video. Given monetary constraints, the director of Gabriel Knight 2 rarely shot more than two takes of any scene. Characters in these games move in short loops, and movements were recorded with the same generic neutral stance beginning and ending every action to maintain continuity between loops. While Gabriel Knight 2 and Phantasmagoria 2: A Puzzle of Flesh (Lorelei Shannon, 1996) replaced rendered, 3D backgrounds with digital photographs (figure 4), the compositing of the actor into the environment has an odd effect in which the “real bodies” recorded through video seem to hover above the environment into which they are placed, an effect also perpetuated when the video quality and resolution of characters were often different than that of the environment. Many of the video artifacts of blue screen compositing were clearly visible in these games, as well (commonly referred to as “jaggies” because of their pixelated jagged edges).
The problems these games had with representing reality reveal numerous traces of the material processes through which actors’ bodies enter into games, be they through obvious composting, “jaggies,” or the repeated and awkward movements resultant from digital video loops and simulated environments. As such, these digital artifacts are a kind of index that demonstrates not the pureness of digital simulation in obscuring the material reality of media, but rather how digital bodies rely on a material apparatus that leaves traces—even though these traces are often intentionally obscured in the name of realism. This would suggest that the ability of the apparatus to register itself and its own inscription processes would negate much of the ability to produce a sense of realism in these games that so actively desired to be seen as indexing reality through digital video.
However, the reference to “real bodies” in these games resulted in numerous moral panics surrounding their representation of violence and sex. As these games were designed for adults, they regularly incorporated sex and graphic violence. Phantasmagoria is noted for including a rape scene and, depending on the player’s skill, could include repeatedly witnessing a gory video of the main character’s head being sliced in half with a swinging blade. Phantasmagoria 2 is a disturbing psychological thriller that includes grisly images of murders the main character may have committed. The game’s main character is also bisexual, has (heterosexual) sex numerous times, and experiments with BDSM. These games were regularly banned by government regulatory agencies, and the North American Entertainment Software Rating Board (or ESRB) was established in 1994 partially as a response to the full motion video in the game Night Trap (Digital Pictures, 1992) and what was thought to be its overly sexual content.48 The incorporation of “real bodies” into games is linked with games as a mass medium for adults—which also led to larger panics about the effects of seeing (and controlling) “real bodies” on screen in a medium that had been (and continues to be) thought of as primarily for “impressionable” children. The presence of bodies becomes a sign of realism through the articulation of inscription with specific narratives, be they about genre or about the role of games themselves.
In spite of the fact that these games were often exceptionally popular (and, in the case of Gabriel Knight 2, critically acclaimed), full motion video quickly vanished, replaced by animated 3D models and environments. Almost in a return to the earliest graphic adventures, attempts to incorporate the body into the game vanished, with voice seemingly the only index of the bodies involved in game production. While Gabriel Knight 2 replaced Tim Curry with actor Dean Erickson, who looked like the game’s eponymous character, Gabriel Knight 3: Blood of the Sacred, Blood of the Damned (Jane Jensen, Sierra On-Line, 1999) used 3D animated character models and had Curry reprise the role now that significant parts of the human body were once again removed. LucasArts developed a new engine for 3D animation for its Grim Fandango (Tim Schafer, 1998), GrimE (or Grim Engine), which was only used for one additional game (Escape from Monkey Island, 2000) before LucasArts effectively left the genre behind, cancelling their remaining adventure games in production. Sierra went through a number of corporate restructurings throughout the 1990s, and, in response to conservative criticism of adult content in games like Phantasmagoria, ended up ceding creative control to corporate managers.49 In 1999 Sierra underwent massive layoffs, ensuring that Gabriel Knight 3 was one of the last adventure games it would produce.
The common narrative about the death of adventure games assumes that the genre’s demise results from its failure to achieve immersive realism. Gabriel Knight 3 notoriously contains an absurd puzzle that often illustrates the limits of graphic adventures and their claims to represent reality—to rent a motorbike, you steal another character’s passport and then devise a disguise to pretend to be that character. Crafting the disguise includes making a mustache out of cat hair, which you obtain by attaching tape to a shed door and spraying water at a nearby cat. You attach the cat hair to your face with syrup and draw a mustache onto the passport, because the photo on the passport doesn’t have a mustache. This is a ridiculous puzzle, to be sure—and this is a small part of the puzzle as a whole. But the death of these games probably has more to do with corporate restructuring and meddling in the creative processes, along with an increasing scale in the popularity of games. This increase in popularity, perhaps ironically, was carved out in part by Sierra’s targeting of adult and casual gaming markets.
As I’ve been suggesting, the questions of realism in these games are surprisingly complex and should be thought of less in terms of mirroring of the logic of “real life” (something that no game actually does) than as an inscription of “real bodies.” The initial turn to 3D in games seemed to signify a turn away from the human body. But this turn was only temporary given the massive investment in motion and performance capture today, represented by, among other things, companies like 3Lateral and Cubic Motion who work with AAA game developers, and Adobe’s Mixamo suite, which provides a database of character models and animations widely used by amateurs and independent developers. The afterlife of the graphic adventure genre today dialectically combines the inscription of the human body with these experiments with 3D—not through a deferral to the visual, but through motion capture and the extraction of gestures used to animate 3D character models.
My claims have revolved around historical and theoretical concerns related to video games, indexicality, and realism. I want to conclude by pointing in a different direction: towards the political concerns that emerge from rethinking digital media as a particular indexing of embodied motion. Since around 2010, there has been a resurgent interest in the graphic adventure genre, some of which stems from nostalgia, some of which stems from the emphasis adventure games placed on storytelling. This includes the resurrection of “interactive cinema” by the French designer David Cage and his company Quantic Dream. Cage’s games, which include the popular Heavy Rain (2010), Beyond: Two Souls (2013), and Detroit: Become Human (2018), involve the same adult subject matter that characterized full motion video adventures, and again make reference to filmic narrative conventions. Instead of video, however, Cage makes extensive use of motion capture technology to combine the 3D bodies of games like Gabriel Knight 3 with the technically impressive rendering of facial emotion and bodies, similar to those of the “Japanese Example” with which we began. Cage’s games rely on a kind of realism that defers to “real bodies” encoded digitally—bodies that, in the case of Beyond: Two Souls, are again of screen actors. Beyond “stars” 3D character models of Ellen Page and Willem Dafoe made through various versions of performance and motion capture. The kinesthetic data recorded from Page and Dafoe were used to create playable characters that resemble the physical appearance of both actors. Page’s character ages throughout the game, and the playable avatar is a child, a teenager, and an adult at different points. The data inscribed by the digital apparatus enables the production of multiple bodies that are able to make reference back to Ellen Page through a variety of kinesthetic traces that originate from her physical body.
This opens up an entire host of problems, as this kinesthetic index can be recombined with other bodies, or can be used to invent a “body” that performs acts that the original actor may never have consented to. Nude images of Page’s character appeared on the website Reddit even though she did not appear nude in the game and did not film any nude scenes. Alternate camera angles accessible through the game’s debug menu showed the character as “nude” while taking a shower. These images do not index reality in the same way as a photo—a view that was expressed by a representative of Sony who suggested that the images were “damaging” to Page even though “it’s not actually her body.” Even when a body consents, as it does in something like the “Japanese Example,” different aspects of a body can be extracted because of the value they possess in difference contexts. This potentially changes how representation can be understood with digital media, making specific identity markers appear and disappear as bodies are combined and rearranged to privilege different ways of understanding and representing a body at different times. “Deepfakes” have become an increasing concern, which use artificial intelligence and 3D models to combine images of faces with videos, or other forms of motion and sound, making it seem as if someone said or did something they had not. A Deepfake links a mutable visual image with the motion of another body.50 The uncanniness of these photos and videos is not because of their lack of relation to reality, but because of their incorporation of partial traces, some of which are visual, some of which are kinesthetic. Deepfakes are concerning because of the connection they have to reality, not the absence of that connection. It’s difficult to know what to make of this transformation of representation, because it appears to completely erase from view how one appears to another in any consistent or coherent way.
If we think of motion capture as a kinesthetic index, we should ask what the politics of a medium like motion capture should then be. Even though these are digital images, there is still a trace and a touch through which bodies are materially inscribed into media. Bodies still exist, or at least parts of bodies still exist. We can only address the representational politics of images such as these if we acknowledge the history of how these representations rely on a series of material apparatuses that manifest specific bodies and how they appear to others, along with the assumptions about what makes one representation “more real” than another. The history of games sketched above reveals how digital bodies are fragmented and rearranged, and how they come to matter depends on the technical modes of inscription that permit specific bodies (and parts of bodies) to enter into relation with others. These inscriptions are made with an explicit desire to achieve “the real,” though this realism is not without its contradictions, reliant on modes of technical abstraction as it is.
Rather than thinking of these digital images as somehow failing to achieve “reality” through the mediation of the digital, we should take seriously the registration of embodied traces—traces that are not inherently visual, but are often aural or kinesthetic. Embracing this way of understanding the body would perhaps challenge much of how we understand representation. What would it mean to criticize the representation of movement, rather than the representation of appearance, and how would this relate to the politics of identity? The index persists in digital media, and the politics of representation today must examine how bodies leave traces that can be used and remade. The truth of an image has migrated elsewhere, beyond the visual, toward the kinesthetic.