Ars video

Espai de recerca

The short music video in early and classical music

In recent years the dissemination of Western Art Music by means of audiovisual media has been intensified, and its presence is remarkable in social networks like YouTube and Vimeo.[1] The Western Art Music, specially the early music, is searching its place in the current audiovisual space. Visual strategies include, among others, techniques of the art films, advertising, fictional short films, and popular music videos.

What kind of phenomenon is the short video of western art music? What is the role of these short videos? What are its audiovisual strategies? We have no definitive answers to these questions, so in this paper we intend to offer a preliminary approach to the short music video in classical and early music, by means of a brief description of its main functions and audiovisual strategies. In this way, we will try to extract the most relevant research questions to deepen them in the future.

This paper is in two sections. In the first, we offer a brief overview of the videos available online and we will remark some of their main features. In the second one, we deal with a specific case study: the video of Franz Schubert’s lied Der Erlkönig, by Anderson & Roe's piano duo. We will point out some of the principles of its audiovisual strategies.

1. The music videos on YouTube

We start with an overview of early and classical music videos that we can find on virtual social networks. We intend to offer not an exhaustive typology of all existing videos, but a preliminary approach based on a simple criterion: the number and complexity of the visual information presented in each video.

1.1. Amateur video

This is the casual recording made by the audience in the live concert, which may be uploaded and distributed on line. Usually image and music are recorded in a single take and its quality is very low. The role of these recordings is documentary and has the function of sharing a personal experience: “I really was there!” (see figure 1):

Figure 1

Figure 1. Amateur recording uploaded on YouTube. Watch this video here.

1.2. Professional recording and editing of a live concert

In the concert broadcasting for TV or for a DVD edition, the visual aspect is enriched through editing process. These videos offer diverse views of the same live performance and in this way, the visual aspect "begins to speak for itself”. In other words, by means of audiovisual editing, the video becomes a different element from the live experience of the concert. It is not a documentary recording of a specific event, but the creation of a different artistic artifact. These videos are located in the realm of simulation (see Baudrillard 1981) and mediatized performance (see Auslander 1999).

One example is the TV broadcasting of the concert Icônes du Seicento by L’Arpeggiata. Figure 2 shows some shots that illustrate the visual editing of a live recording.

Figure 2

Figure 2. L’Arpeggiata, Christina Pluhar (conductor) in Ciaccona di Paradiso e dell’Inferno by Anonymous. From Icônes du Seicento, Mezzo, 2008 (DVD). Directed by Olivier Simonnet. Full film available here.

1.3. Musical performances staged for filming

In these cases the performance is staged expressly for the audiovisual media. Generally speaking, these videos present a sophisticated level of visual construction: a large number of video cameras that allow a variety of shots, a careful audiovisual editing and post production process. Here we find several kinds of productions.

1.3.1. Musical performances filmed in historical locations

Filming a musical performance in a historical location is a recurrent feature of early music video. Often the location is related to the historical context of music. In this case, the main issue of visual discourse is not the musical performance itself, but the place in which it is played. For this reason, the video camera takes different aspects of the place, employing long-shots and depth of field.

In this regard, a remarkable case is the French documentary maker Olivier Simonnet. In his productions the concert appears like an aesthetical object in its own right, which is filmed from a variety of perspectives that “expand” its artistic dimension. Thereby, the musicians’ gesture acquires a relevant role in the film.

The kind of shots and editing style by Simonnet, allows him to construct a singular audiovisual spatiality. On one hand, Simonnet integrates the performance’s place in the artistic situation. On the other hand, he expands the audience’s perceptual possibilities, by means of constructing musical views which exceed the single frame of a musical performance, as is represented in the film Une autumne musicale à Versaille (see figure 3):


Figure 3

Figure 3. Marc-Antoine Charpentier. Un automne musical à Versailles, 2005, (DVD) Directed by Olivier Simonnet. Full film available here.


Simonnet not only employs the audiovisual language to show the performance place, but also to display the musician’s body. For example, in the film Sacrificium: The Art of the Castrati, Simonnet offers an unusual portrait of the mezzosoprano Cecilia Bartoli. She is depicted as an androgynous character, and her performance physical effort is highlighted through American shots and even close-ups (see figure 4). This audiovisual portrait is remarkable because defies the idea of “minimum effort” that often characterizes the virtuoso opera singing, and also breaks the illusionary veil that usually covers the musicians’ body in an opera live performance.[2]

Figure 4

Figure 4. Cecilia Bartoli and Il Giardino Armonico playing Come nave in mezzo all’onde (1725) by Nicola Porpora (1686 – 1768). From Sacrificium. The Art of the Castrati, Decca, 2010 (DVD). Directed by Olivier Simonnet. Full film available here.

1.3.2. Musical performances filmed in unconventional locations

These videos show a performance in two kinds of locations: 1) a neutral place, that is to say, a place that is not evocative 2) an unconventional place, like a forest or an urban landscape.

Some videos like I Musici’s Le Quattro stagioni (video 1), emphasize sets of  images by means of long shots. In this case, an unrealistic situation of music consumption is constructed through the audiovisual media. In this way, the video amplifies the spatiality of the musical experience with the aid of the image, but without affecting the temporality. Furthermore, the video inserts the musical performance in a different fictional space, transforming the so-called fictional pact (patto finzionale), a term used by Umberto Eco in the field of literature, which refers to the agreement between the implicit author and the reader, whereby the latter agrees to enter into fiction (Eco 1994, 91). In the video, the fictional pact is established between the audiovisual artifact and its audience, but it is also determined by the code or conventions imposed by the musical scene itself. Looking at the performer walking by one street, lost in her own thoughts while we hear her musical performance, is unusual in early and classical music videos. However, these kinds of shots are very common in the field of pop music videoclip.  In early and classical music videos, looking at the hard performance of the virtuoso player is one of the main conditions of realism or verisimilitude (this is one of the main features of video 1).

Video 1. I musici and Pina Carminelli (conductor) playing Le quattro stagioni by Antonio Vivaldi (1678-1741). From Vivaldi: Le quattro stagioni. Phillips, 1988 (VHS). Directed by Anton Van Munster.

We can see a different case in the documentary The history of a Luthier directed by Alberto Bona. In this example there is a contrast between the idyllic natural landscape and the musicians’ black clothes. While the unconventional landscape suggests a naturalistic approach to music, the old-fashioned black clothes are like an anchor that keeps the onlooker into the early music domain. 

Figure 5

Figure 5. Davide Monti and Mary Clearly playing Biaggio Marini. From The history of a Luthier, (DVD) 2009. Directed by Alberto Bona. An excerpt of the film available here.


In the video The Italian baroque music by Paul Fenkart, the choice of an unconventional place is enhanced through a particular way of filming. In this case, the camera movements and editing offers an “intersemiotic translation” of the vertiginous rhythmical change in the music.[3] Although the intersemiotic translation seems to suggest a very close relationship between music and image, it produces confusion in the viewers, so it is not always a successful device.


Figure 6

Figure 6. Il Giardino Armonico and Giovanni Antonini (conductor) playing Ciaccona by Tarquinio Merula. From The Italian baroque music. ArtHaus Musik, 1999 (DVD). Directed by Paul Fenkart. Excerpts of this video where it is possible to observe this strategy are available here and here.

1.4.  Videos with more than one audiovisual representation layer

These videos constitute a turning point in the development of the classical music videos. In these cases the images of musicians making music are alternating with other images that are not related with the musical performance. The presence of several audiovisual representation layers introduces different temporality dimensions, which are managed in a singular manner.

In most of music videos, especially in pop music, we can findd various space-time units represented in the images. In the present paper we call it audiovisual representation layers. For example, the video clip “Toxic” by Britney Spears shows three main layers, the second one subdivided in two that are consecutives. Furthermore, there is a fourth layer that crosses sporadically the video (see video 2 and figure 7):

Video 2. Britney Spears performing Toxic. (C) 2003 Zomba Recording  
Figure 7

Figure 7. Layer 1 / Layer 2, 2a & 2b / Layer 3


Figure 7. Audiovisual representation Layers in “Toxic”


We can identify at least three functions of visual representation layers: descriptive, dramatic and narrative.

In a descriptive layer the image simply shows any recognizable iconographic content that occasionally may be abstract. In the early and classical music videos, the performance representation belongs to the descriptive layer, as shown in the following example (see video 3):

Video 3. El Cortesano (José Hernández Pastor, countertenor; Ariel Abramovich, vihuela) playing Si la Noche haze oscura (1552) by Diego Pisador (c.1509 – c.1557). From Si me llaman. Carpediem, 2009 (CD) Carpe Diem.

There are two descriptive audiovisual representation layers alternated in this video. The first one shows the musicians’ performance, while the second one shows blurred night shots of an anonymous street of Spain. We would like to remark some characteristic of both layers. In the first one, both performers are playing without interaction between them, in an undetermined place without length or depth of any kind (as a sort of non-place). In this particular example, the onlookers are seeing a space that can only exist in video space.

On the other hand, the second audiovisual representation layer tries to reflect the song lyrics: Si la noche haze escura y tan corto es el camino, ¿como no venis amigo? (If the night is dark and so short is the way, Why do not you come, lover?). In the song, the speaking subject and the physical place from which he speaks are unknown. This indeterminacy is reinforced by the blurred images of a street full of walking people as well as the absent of a particular individual. In this way the second layer reinforces the melancholic atmosphere of the lyrics.

In a dramatic layer the image emphasizes some musical element. In the early and classical music videos, very often, the dramatic layer is a staging version of a musical performance that usually is complemented with the set design, or a kind of dramatized performance full of theatrical exaggerations (see video 4):

Video 4. Capella Ministrers - Carles Magraner. Pavane and galliard Salad from "The Trulla" by Bartolomé Cárceres. Recorded at the Lonja de la Seda de Valencia in January 2012. Music video disc presentation of the book "Batailla in Spagnol" CDM 1231

The theatrical aspect is not only present in the scenography and customes, but also in the dramatization of the music lyrics, as we see in the video Monteverdi, Banquet of the Senses, madrigal erotici e spirituali (see video 5):

Video 5. The consort of musicke and Antony Rooley (conductor). From Monteverdi, Banquet of the Senses, madrigal erotici e spirituali. 1993 (DVD). Brilliant classic. Directed by Don Taylor

Finally, in a narrative layer the main goal of the image is “to tell a story” of its own. Sometimes, the story has nothing to do with the context nor with the content of music. In the next example (see video 6) the Bach’s cantata BWV 30 is the musical background of a funny fictional story. The cantata is playing inside the visual narrative, but the storyline has nothing to do with Bach’s music.

Video 6. Magdalena Kozena performing "Freue dich, erlöste Schar", BWV 30. From Bach Arias. 1999. CD. Archiv Produktion

1.5. The iconographic and plastic levels

In the video of Vivaldi’s The four seasons played by the violinist Anne Sophie Mutter, two descriptive layers that happen in different space-time realms are represented. The first one is the musical performance which emphasizes the soloist violin player. The second one is the black and white images of the rehearsals. The space-time sphere of each layer is different (see video 7 and figure 8):

Video 7. Anne Sophie Mutter playing The four seasons. Deutsche Gramophon, 1999 (DVD)
Figure 8

Figure 8. Two Audiovisual representation layer in the video The four seasons. Deutsche Gramophon, 1999 (DVD)


This video allows us to employ two concepts from visual semiotic studies: The iconographic and plastic levels (Grupo µ 1992). In the iconographic level, the image is formed by the objects that are represented. In the previous example, this level is formed by two layers: two set of images which are alternated in different a space-time span. The first one represents the violinist’s performance, and the second one shows a music rehearsal in other space-time frame.

The plastic level is formed by the intrinsic characteristic of the image, independently of what it represents, like color, textures, frames, postproduction, lighting filters, visual effects, etc. In the plastic level, the previous video shows colored panels, artistic lighting; close-ups, accelerated montage, visual effects like tracking, changes in color, brightness and contrast, etc. There are also some shots with abstract images.

1.6. Dramatization or narration of music

It is necessary to address some aspects of dramatic and narrative audiovisual representation layers. The dramatization and/or narration are based on music and this “drama” may be more or less related to the historical context of music, its lyrics or even with the musicians’ features. We would like to highlight that classical music videos always include a descriptive layer, which represents the musical performance. This layer not only coexists with dramatic layers, but also can become one of them. The dramatization and narration of music in the video appears at least in three manners: it is direct or linear when literally represents the lyrics or contents of music; it is allegorical or adapted, when the starting point is the music, but the story searches its own path keeping some matched points with the music; it is arbitrary or superimposed when music and image apparently are not related.

The video of the Monteverdi’s Madrigal Quel Augellin Che Canta, from The Full Monteverdi  directed by John La Buchardiere, tries to adapt the main issue of the madrigal to contemporary situations (see video 8):

Video 8. A contemporary interpretation of the Monteverdi’s Madrigal Quel Augellin Che Canta. In The Full Monteverdi. Naxos, 2007 (DVD). Directed by John La Buchardiere. Fragments of this DVD here.

In other videos like “Moon song” by the singer Anna Netrebko, we can find two coexisting layers. The first one tends to translate directly the song story: Netrebko is singing while floating on water and dedicates her song to the moon. The second layer is arbitrary because it superimposes a story between two lovers in the middle of the song (see video 9):

Video 9. Anna Netrebko singing Moon Song by Anton Dvorak (Rusalka). From Anna Netrebko: The Woman Voice (DVD) 2004. Deutsche Grammophon. Directed by Vincent Paterson.

In the video clip “Dreaming a King”, directed by Luca Marconato there is a coexistence of three layers: the first one is about a luthier cleaning his musical instruments, the second one is about the musical performance by the violin player Riccardo Minasi, and the last one is about a little girl and a mime playing chess.  The three layers are arbitrary or superimposed, and gradually go to only one plot, thus introducing an interesting development on the narration: Only at the end of the video, the audience understands that music was always at the internal level (Miceli 2009) (see video 10 and figure 9): [4]

Video 10. “Dreaming a King” (videoclip). Directed by Luca Marconato
Figure 9

Figure 9. Three audiovisual representation layers. From “Dreaming a King” 2010 (Videoclip). Directed by Luca Marconato

2. Der Erlkonig. Anderson & Roe. When words fade, Steinway Label, 2011 (DVD)

Now we delve into the audiovisual strategies of the video “Der Erlkönig” played by the American pianists Greg Anderson and Elizabeth Joe Roe. This video is part of the DVD When words fade (2011) produced by Steinway record label. In this video, the music is an arrangement for fourhands piano of the famous Schubert’s lied Der Erlkönig, based on a poem by J.W.Goethe. [5] The poem tells the story of a Father that is riding at night with his sick son. The kid feels the mysterious presence of the Elfking that wants to kidnap him. The boy continuously asks his father for help. He tries to calm him down answering that all is in his imagination and hurries up to go home. The Elfking tries to persuade the boy to go with him and finally he threats him with violence. The child screams: the Elfking has hurt him. When the father and his son arrive home, the child is dead (see video 11):

Video 11. Original versión of Der Erlkönig by Dietrich Fischer-Dieskau. (Bariton) and Gerald Moore (piano). From "Dietrich Fischer-Dieskau", BBC archive directed by Walter Todds, 1959

In the audiovisual version by Anderson & Roe, the video tells a superimposed story: Anderson & Roe are attacked by a strange force during their performance. Roe is thrown down and hit by this force until she is able to playing again. Both players make a superhuman effort to continue the performance. Finally, Roe is caught by the piano strings and “devoured” by the piano (see video 12 and figure 10):

Video 12.  “Der Erlkönig” (videoclip). From When words fade, Steinway Label, 2011 (DVD) Directed by Mathew Brown
Figure 10

Figure 10. Greg Anderson and Elizabeth Joe Roe in “Der Erlkönig” (Videoclip). When words fade, Steinway Label, 2011. Directed by Mathew Brown

2.1. Video main features

The music has three dramatic areas, which are alternated in the video. Each of these areas determines the clip’s visual information.

First area: The anguish.

In this area we find the child’s calls for help (mein Vater!). These are repeated four times, each time in a high-pitch. The harmonic function is V/V (secondary dominant) that progressively  becomes more dissonant and never is conclusive, except at the end. The montage becomes accelerated; the camera “shakes” and the shots go from the long-shot to the Italian-shot, until the image becomes unrecognizable.

Second area: The stabilization

It corresponds to the father’s voice that tries to calm down his child. The musical range of this part is very low and offers a tonal resolution, which contrasts with the anguish section. This area is in distant keys reached by strong modulations. The montage is slower and the camera is mounted on a tripod. However, the piano players are very  agitated.

Third area: sweet lyricism.

It corresponds to the Elfking voice that tries to tempt the boy. The range is very high. Usually the harmonic functions of this area are very different from the other two. In this case the harmony is reached by means of smooth modulations, but the return to the tonic is very abrupt. There are long-shots and the length of this area is greater than the first one.

Here we can find an interesting synchronization between music and images.

In the video 13 it is possible to follow the videperformance of Anderson & Roe with the subtitles of the original song to see the relationship between the  original song and the video’s audiovisual strategies:

Video 13. Video with Spanish subtitles which correspond with the original lyrics

2.2. Kinethic Anaphons

Based on a Phillip Tagg’s musical semiotics, we would like to call kinethic anaphon the precise synchronization between a gestural element of the image and one specific musical sound, which produces the illusion of audiovisual synchresis.[6] There are at least two kinds of anaphons:

2.2.1. Those that match with a musical note without a structural importance:

These are images that produce a singular synchresis effect, because the sound emission is motivated or intrinsically linked with the image.

Figure 11

Figure 11

Figure 12

Figure 12

Figure 13

Figure 13


  • Roe spills sweat that are sychronized with an arpeggio (see figure 11 and  video 12, minute 02:08)
  • High notes on the piano synchornize with the falling sweat drops or the falling dress sequin (see figure 12 and video 12, minute 02:21 to 02:26; 03:12; 03:22 and 03:36)
  • The piano’s screw that fall (see figure 13 and video 12, minute 01:42)
2.2.2. Those that match with fundamental musical process (like rhythm, harmony and form)

These are images that match with important structural aspects. Generally, these show structural elements of the layer or plot, so that it becomes an element of coherence in music for audiovisuals.


  • Roe is “attacked” by the piano for first time at the beginning of the second strophe (see figure 14 and video 12, minute 01:29).
  • Anderson’s piano bench is violently attracted by the piano at the final cadence of that section (see figure 15 and video 12, minute 01:34).
  • Roe is lying on the floor, but reaches to play a chord at the final strophe (see figure  16 and video 12, minute 01:48).
  • The piano strings catch Roe, matching with the last call for help of the boy. This musical section resolves when the Elfking hurts the boy. In this moment Roe is “eaten” by the piano (see figure 17, 18 and video 12, minute 03:43 and 03:59).
    Figure 14

    Figure 14

Figure 15

Figure 15

Figure 16

Figure 16

Figure 17

Figure 17

Figure 18

Figure 18

The implausible plot of this video, as well as the simplicity of the stage and props, are compensated by the audiovisual editing. In this case, the dramatic efficacy is neither in the “story” nor in the performance, but in the editing. 

3.  Final considerations

We would like to summarize the results of this preliminary research in seven points:

3.1. Although there are some recent studies about early music video (San Cristóbal 2013), we still do not have a full bibliography about Western Art Music videos. Therefore, our main points of departure are the studies about  pop music video clip.

3.2. The video contributes to the public images of musicians. However, unlike pop music videos that remark the musicians’ sensuality or social commitment, the early and classical music videos highligh the virtuoso player. Only the singers express a bit of sensuality. In this sense, it is very interesting the case of Cecilia Bartoli, who is a sort of anti-diva, because she doesn’t present herself as a sensual singer. On the contrary, she shows the physical effort of singing through a lot of gestures that are not beautiful. Nevertheless, these gestures are exhibited because they are the source and evidence of Bartoli's virtuoso performance. In other words, Bartoli's gestures in the video are a proof of her identity as performer.[7]

3.3. Like pop music videoclip, the Western Art music video usually appeals to an established audience. In this regard, we can hypothesize at least two possibilities:

  • The search of an established audience. Example: Cecilia Bartoli
  • The search of a new audience. Example: Anderson & Roe’s piano duo.

3.4. One of the differences between popular and classical music videos is the number of visual representation layers. While in the pop music video, the performance may change of layer easily, the classical music videos have fewer  layers and fewer changes between them.

3.5. In the early and classic music videos the layer that shows the musical performance doesn’t change: the musician is always in his role as performer.

3.6. The fictional pact introduced by the early and classical music video is strongly linked with a virtuoso musical interpretation. In this aspect, it differs from pop music video.

3.7. The video is not a mere testimonial document of a performance. On the contrary, this constitutes a new artistic artifact, a new kind of performance. This constructs new places and temporalities for the musical experience by means of the image.



Auslander, Phillipe. 1999. Liveness performance in the Mediatized Culture. New York: Routledge.

Baudrillard, Jean. 1981. Simulacres et Simulations. Paris: Editions Galilée.

Biguenet, John, Rainer Schulte. 1992. Theories of Translation: An Anthology of Essays from Dryden to Derrida. University of Chicago Press.

Chion, Michel. 1994. Audiovision: Sound on Screen. New York: Columbia University Press.

Eco, Umberto. 1994. Sei passeggiate nei boschi narrativi. Milano: Bompiani.

Groupe µ. 1992. Traité du signe visuel. Pour une rhétorique de l'image. Paris: Le Seuil. 

Miceli, Sergio. 2009. Musica per film: Storia, estetica, analisi, tipologie. Milano: LIM.

San Cristóbal, Úrsula. 2013. “Prima l’immagine, poi la musica? Le nozioni di performance e performatività nella produzione audiovisiva di musica antica” Tesi di Laurea Magistrale in Musicologia. Università degli Studi di Milano.

Tagg, Philip. 1992. ‘Towards a sign typology of music’, in Rossana Dalmonte et al. (eds.) Secondo convegno europeo di analisi musicale.



[1] We use the umbrella term Western Art Music to refer the repertoire ranging from the Gregorian chant to contemporary music, which is currently studied and preserved into academic structures like conservatoires.

[2] Further information about this topic in San Cristobal (2012)

[3] The term Intersemiotic translation proposed by Roman Jakobson in the essay On Traslation (1959) refers a kind of translation “which interprets linguistic signs by means of systems of non linguistic signs” (Biguenet & Schulte 1992, 253). The concept has been use in musicology by Philipe Tagg (1992).

[4] Miceli proposed that film music acts in three levels: internal external and middle. According to Miceli: “Si definisce livello interno un evento musicale prodotto nel contesto narrativo della scena/sequenza. La sua presenza può essere manifesta oppure dedotta dal contesto. Nel primo caso la fonte musicale è visibile a gradi diversi di enfasi. Ad esempio un personaggio accende un ricevitore radiofonico [...] oppure assiste a una esecuzione dal vivo, e ancora suona egli stesso uno strumento. Nel secondo caso il contesto rende plausibile una o più presenze musicali, seppure non svelate, purché congrue [...]. In ogni caso si considera un intervento di livello interno come appartenente al narrato” (Miceli, 2009, 643-644).

[5] For one harmonic-analysis of the song see

[6] According to Chion Synchresis is “The forging of an immediate and necessary relationship between something one sees and something one hears at the same time (from synchronism and synthesis). The psychological phenomenon of synchresis is what makes dubbing and much other postproduction sound mixing effects possible” (Chion 1994: 224)

[7] A full analysis of Bartoli’s performance in the video Sacrificium is in San Cristóbal (2013).


Log in to comment.