As a singer-songwriter for electronic music, Imogen Heap was inspired with the idea for music to be produced through hand gestures, and thus came the birth of the gloves. Whilst they were originally restricted to sounds from a computer software, now the gloves can be applied to the voice and other analogous musical instruments. Ariana Grande is one of the first few artists in pop music to use these gloves in live vocal performances.

“There are some questions about the voice and technology that do not go away. What happens, for example, when the distinction between the human voice and the voice of the machine is blurred? Can we still distinguish between genuine and synthetic affect? Should we? Can we still distinguish between uniqueness of an individual’s voice and the social and cultural determinations that shape its performance? Should we even try to do so?” – Neumark et al. 2010

Cybernetics v Organic?

The relationship between the ‘cybernetic’ and the ‘organic’ is often conceived as that of opposition in Western cultures. We have a tendency trying to put things into one of the two categories. However, exploring engagements of human intimacy with external technologies offers a different attitude towards the two concepts (Parkhurst 2012). The arrival of the microphone in the 1920s encouraged the innovation of new vocal techniques and musical styles (e.g. ‘microphone singing’) (Lockheart 2003). Still, the boundary between the ‘organic’ voice and the technological microphone was not compromised. This is possibly due to the primary function it being restricted to volume amplification, a relatively minimal modification of ‘natural’ projection of the voice. With the gloves, this modification is made much more explicit.

“And, listening to amplified and distorted voices, we can sense the undecidability, tensions, and ambiguity that subtend the normally coherent voice. Complex and paradoxical, voice once more calls out for theoretical and artistic exploration.” – Neumark et al. 2010

Figure 1 Mathematical model of feelings of uncanniness induced by various human and non-human objects (Mori 1970)

Uncanny Aesthetics

When Ariana Grande tested out her voice with the gloves in the video, the man sitting next to the computer panel was struck with awe – her digitalized voice was uncanny. Aesthetics is understood as, not only the theory of beauty, but also the theory of the qualities of feeling. Whether positive or negative, uncanniness is a feeling based on intellectual uncertainty (Freud 2003). A mathematical model by Mori (1970) represents how this feeling is culturally constructed between the human likeness of a non-human object and a person’s sense of familiarity towards it (Figure 1). Such cultural constructions of uncanniness can be shifted. In other words, the collective conceptions of what is aesthetically comfortable (and normalized) can change through cultural revolutions (no matter how big or small) in society (Gould 1997). Whilst Western media portrays high-tech developments to be replacing traditional ways of life, robotics culture in Japan takes on a more positive concept of integrating such advanced technologies with those ‘organic’ conventions. Technology allows for the continuity and enhancement, rather than the disruption of traditional values (Šabanović 2014).

Using technology in and on the human body can be seen as a way of feeling the world. The term ‘cyborg’ is more a matter of perspective rather than definition. The human-technology interaction becomes less of an offense against sacred bodily space, and more of an emotively aesthetic experiment. In other words, human engagement with external technologies is a creative way to attempt emotional perspective via different scales (Parkhurst 2012). As Imogen Heap arrived at the revelation of her gloves, she reflects: “I wasn’t looking for control, I was looking for freedom” in producing and performing music.

Here is a TED talk of her summarizing her journey in developing the gloves and a demonstration of their usage:



The Affective Voice: Song (Part 2)

‘Unsuccessful’ Affect

Although Ciaran O’Driscoll gave an impressive vocal performance in the auditions, the coaches decided not to choose him. The changes in vocal registers and style within 90 seconds was too frequent and ‘too confusing’. In artistry, there is a popular emphasis on the vitality of bringing unique style to the artist’s own work. Still, there is an audience, a collective, to conform to. In order to ‘successfully’ communicate songs, singers are advised to get the right balance between technique and emotion (Larrouy-Maestri 2013).

“Culture is not a template that controls people’s thoughts and patterns of action, it is rather available knowledge that is invoked and constantly re-invented in the course of social interaction.” – Blacking 1983

Body Technique and Emotional Expression

Western society is a playground abundant with thoughts and actions based upon Cartesian dualisms (e.g. subject and object; mind and body). Virtuosity is an area where such dualisms are explicitly blurred. Technical skill and depth of expression are not incompatible, but rather complementary in the right context. Art forms expressed through the body are social institutions. No matter how individualistic the artist’s style is, feelings are culturally encoded as soon as they are brought into action. In other words, feelings which are expressed or received through a medium of singing (and other bodily art forms) come from a repertoire of collective emotional experiences. People learn ways of talking about and ways of acting them out through phenomenology (Blacking 1983). This phenomenological and synergetic cycle of learning and conveying emotion through body technique is saturated with affect, which becomes culturally specific in itself. In the case of Ciaran O’Driscoll, his affect was ‘wrong’ (or unsuccessful) in the cultural context shared by the coaches (and with the audience to some extent).

Balancing Act

We must recognize the ambiguity in the intentionality of the voice: the ability to vocalize what one means. There is always a conscious difficulty in producing an adequate representation of what singers wish to convey, conceptually and emotionally (Pecora 1985).

“Like riding a bike, making a vocal sound may partly depend on the fact that it happens largely without conscious control. So subtle and complex are the muscular adjustments required that, as with riding a bike, you are liable to fall off if you think too hard about what you are doing…But this does not make the voice any less of a production; it just points to the many different pressures, conscious and unconscious, voluntary and habitual, that are always acting on the raw material of the voice to shape and style it.” – Connor 2011

Singing requires a balance between raw emotion and acquired technique that is extremely delicate, and often at a different locus for each individual. Most vocalists spend many years of training to get that desired balance, and further practice to habitualize it via muscle memory.


The Affective Voice: Song (Part 1)

Farinelli and the Snow Queen

King of Masked Singer is a South Korean singing competition between celebrities, whose main profession is not necessarily in singing (e.g. comedian, weather reporter, football player). Their identities are anonymized to prevent a certain degree of bias, and caricatured by specially designed masks that ‘fit’ (or decorate) the demeanors of their voices (see my previous post on the voice’s biological and cultural identity).

Singing is an action that signifies emotions – i.e. an ‘action sign system’ (Farnell 1999). I intentionally chose Korean media without English subtitles to showcase the importance of recognizing such subjectively emotive signs without the need of objective language. Although foreign spoken languages need to be translated for general communication, action sign systems do not have the same requirement. After the duet by Farinelli and the Snow Queen, the panelists commented on how they were touched by their performance. Even without seeing their facial expressions, the audience could hear the sentiments in the breaths they took, as well as the tension in their vibrato. The aesthetics of the voice already carries substantial information and the ability to transform the listener without the structure of linguistic (Farnell 1999) or visual queues.

“Music amplifies the dramaturgy of sound.” – Ihde 2007

Affect and Phenomenology in Music

What I have discussed here is affect. Although it is debatable, the term may have originated from our ability to recognise mental states and intentions in body movement (Blacking 1983). This phenomenological aspect of affect can be perceived in singing, as music amplifies sensuality in the bodily participation (Ihde 2007) of the voice. In contrast to Western distinctions between emotion and cognition, affect showcases the importance to treat feeling as a function that is just as rational as thinking. It is a state embedded with purpose and creativity that is transformative and symbolic (Blacking 1983).

Affect is often seen as the production of emotion through action, such as singing. The musical voice still affectively shines through language barriers. Whilst we see successful communications through such action sign systems, it also raises a concern of when body technique fails to convey the intended emotional message to the audience. I will explore this issue in my next blog post. Meanwhile, if you were affectively touched by Farinelli and the Snow Queen, you can listen to their full performance here.


The Egoistic Voice…?

“To work with a concept of self is to conceptualize the human being as a locus of experience, including experience of that human’s own someoneness.” – Harris 1989

Conflicting Perceptions

The video gives a scientific explanation of how our voices sound deeper in our own head than it actually is. Most people who hear a recording of their own voice tend to dislike the unfamiliarity of it. Whilst most people express this strangeness with negativity, a research paper shows that they are actually more attracted to their own voices when they did not recognize the recording as their own (Hughes & Harrison 2013).

Self-bias of the Voice

Anthropological studies on personhood and the ‘self’ argue how an individual’s sense of personal identity is a product of acting upon and internalizing social experience. Embedded in this phenomenon is the role of publicly shared concepts in shaping private experience (Harris 1989). As I have mentioned in my previous post, we see this in action where there are gendered trends of how people manipulate their own voices in attempt to sexually attract another person. Such trends are drawn from publicly formed stereotypes about attractive men and women. The participants in the video kept on making references to sexual attractiveness about their own voice, regardless of how positive or negative their view on it was.

Technological Methods and Theory of Mind?

In general, it seems that the voice creates a discourse about the role of public concepts of sexual attractiveness that help shape a sense of self-identity. Our opinions of our own voices that diverge depending on whether or not we are aware that it belongs to us. On a side note, I think it is interesting that digital technology has provided a method of making explicit in how we see ourselves, not just through our own bodies, but also through virtual simulation. Again, this also demonstrates how this extent of theory of mind is unique in humans.

NB: The ideas I have discussed here can be found in the context of people looking at pictures or video recordings of themselves as well!


The Fetal and Newborn Voice

Human Uniqueness

Mechanisms of auditory perception and vocal production in humans are derived traits from our ancestor primates. In contrast to non-human primates, human infants develop spoken language quickly and almost mindlessly (Mampe et al. 2009). There are many genetic studies that explain the difference of language acquisition and discussions on cross-cultural variation. Perhaps we can also explain both issues through the middle-ground of behavioural and physiological development.

“Prosodic features such as melody, intensity, and rhythm are essential for an infant acquiring language.” – Mampe et al. 2009

Perception of the Voice

Exposure to auditory stimuli during pregnancy influences newborns’ perception of sounds. Human fetuses start to react to sounds from about 30 weeks of gestational age (Kisilevsky et al. 2016). They can memorize prosodic features (accent and tone of syllables) from the outside world by the last few months of pregnancy, developing sensitivity to melodic contours (rises and falls of pitch) in both music and language (Mampe et al. 2009). This maturing ability to process pitch intervals enables them to proficiently appreciate musical melodies and emotional and linguistic prosodies once they are born.

As foreigners residing in Korea, Dave (American) and his international friends (French, German and Spanish) compare each other’s native languages for certain words. They demonstrate the obvious differences in linguistic pronunciation. Although it is not as apparent in this video, languages in general can be differentiated through trends in melodic contours. As mentioned previously, perception of prosodies are learnt as early as the prenatal period. This contributes to sound production of the infant’s native language when they start to learn and speak their mother tongue.

Production of the Voice

Newborns may have learned prosodic traits of their native language by listening to them inside their mother’s womb. A study analyzed crying patterns of French and German newborn babies in terms of their melody and intensity contours (Mampe et al. 2009). The French group had a tendency to produce cries with a rising contour, but the German group produced falling contours (Figure 1). These patterns are consistent with the intonation trends heard in both languages – French intonations typically have a pitch rise towards the end of sentences, whereas German intonations usually display a falling pitch. Not only have they acquired the main intonation patterns of their surrounding language, but they also have the ability to reproduce these patterns in their own vocalizations.

Figure 1 Frequency and amplitude of cries measured from French and German newborns

Nurturing Nature

Learning language is usually discussed through cultural or genetic aspects of sociality, cognition and vocal articulation. Evidence of how a newborn’s cry is shaped by hearing their native language from established speakers in the external environment reveals the direct interaction between culture and biology. Since the established speakers also went through the same prenatal conditioning, we can say that it is an open and cyclical system of nature nurturing nature.


The Developing Voice

“Actually, there probably never is a raw – or naked – condition of the voice. Even seemingly spontaneous vocal actions like crying, sobbing, laughing, groaning and gasping have their characteristic tonalities and rhythms…Even before language, using the voice to create sound is an intensely styled thing.”

– (Connor 2011)


At the age of 10, Jackie Evancho surprised the audience on America’s Got Talent with her operatic voice. As the body matures and grows, so does the anatomy of the voice. Infants usually have a higher larynx allowing them to drink their mother’s milk and breathe at the same time. New to the world, they have limited control over their own bodies, and so this anatomical fixture prevents them from choking. A high larynx combined with shorter vocal cords makes a child’s voice higher in pitch than that of an adult. As they grow up, their larynx drops and their voice deepens. For children, having smaller lungs and weaker abdominal muscles has restricts their ability to perform the same type of energetic vocal tasks that adults can (Fisher et al. 2016). In other words, it is rare to see young children to perform a heavy rock song or an operatic aria (with Jackie Evancho as one of the exceptions).


Some singers such as Gallant are praised for having the ability to sing outside of the average vocal range of their own sex. Although girls and boys roughly have similar vocal developments in their early years, their patterns drastically diverge during puberty. Males naturally have higher levels of testosterone than females. This hormone stimulates the growth of the male vocal tract, making the larynx a lot larger, and with longer and thicker vocal cords (Fisher et al. 2016). As a result, the average pitch of a normal speaking voice of a grown man is much lower than that of a grown woman – typically by a fifth of an octave (Figure 1). Ignoring normal speech, Gallant happens to be able to sing in the female Soprano range.

Figure 1 Speaking and singing ranges (Fisher et al. 2016)


In the Blind Auditions, the coaches in The Voice UK were confused of whether the owner of the voice was male or female. Whilst James Byron identifies himself as male, he expresses his style as androgynous. Genetic and hormonal constraints are not the only factors that affect our perception of the voice’s gender.

In transgender therapy, a common method for people who biologically transition to the other sex is through hormonal treatment. So, a male who transitions into a female would take regular doses of estrogen. However, for a naturally male voice that has been through prolonged testosterone exposure, the vocal folds cannot instantly thin to the extent of those of a normal female voice. Additionally, pitch is not the only concern. Speech patterns are gendered as well. Males tend to speak in a monotone voice, and so dynamic intonations should be developed for them to achieve more of a female voice (Sarabia 2016).

Nature and Nurture

Our voices are usually good indicators of identity, especially with age and gender. On the one hand, the voice is an exemplar of how genetic constraints make distinctions between the young and old, and the male and female. On the other, they can also sometimes be red herrings. In the case of transgender speech therapy, we find that adjusting the behavioural tendencies of the voice is just as important as its pre-existing anatomy –  this demonstrates the interplay between nature and nurture.


The Accented Voice

Acting it out


Pronunciation in the English language can vary depending on where you were brought up. In the Mad Libs performance, Benedict Cumberbatch was in the character of a cop and attempted to adopt a heavy Boston accent (perhaps a stereotypical NYPD officer often portrayed in crime TV dramas). At the same time, Jimmy Fallon started the performance talking in his usual American accent, until he decided to change into a ‘Ukraine accent’, when he was forced to announce his character’s nationality, pre-chosen in courtesy of the Mad Libs game.

Geographical background

Apart from pronunciation, intonations also have an important role in making accents unique to their associated places. For example, speakers of American English have a tendency to change their volume for emphasis, whilst those of standard British English are more likely to change their pitch (Fisher et al. 2016). Whilst our ability to perceive and distinguish between accents is impressive, it has some negative consequences in socio-political relations as well. An experiment involved people using standard and non-standard dialects of American-English to make several phone calls to the same landlord for housing requests. The results revealed that the landlord seemed to have discriminated against prospective tenants based on the sound of their voice, indicative of non-standard dialects. Based on this study alone, Hispanic Americans experienced more discrimination than African Americans (Purnell et al. 2016).

Multilingualism and Cultural Acquisition

Here, I present an example from my personal experience. I was born and raised in Hong Kong, an international and multilingual city. I attended preschool with American, Canadian and Chinese teachers, and spent all of secondary education in a UK boarding school. Having spent 7 years immersed in the British social life, my Cantonese has unfortunately suffered the consequences. In one of my trips back home, I once got into a local HK taxi and uttered my destination in Cantonese. With just 5 syllables, the driver asked me if I had studied abroad. He said my Cantonese was ‘slacking in bite’ and that my pronunciation was ‘lazy’. More recently, I have graduated 6th Form in Britain with a somewhat hybridised accent. One friend stated that my English pronunciation was American-Canadian but my intonation was British.

Being multilingual means being fluent in multiple languages. However, this does not automatically mean being able to speak those languages with a non-standard accent, even if all those languages were simultaneously learnt as first languages. Although we acknowledge that a variety of accents exist due to cultural difference, we still seem to subconsciously assess the speaker’s linguistic competence. In turn, we let those assumptions affect how we socially interact with people possessing unfamiliar accents (Berthele 2011).

There is a fascinating power in how speech patterns can be very telling of where a person grew up in. Furthermore, with the rising integration of globalisation into our private lives, normal speech is influenced by the speaker’s life history in how they travel through different social spheres. This demonstrates how the voice is plastic in its capability to be shaped through culture – an exemplar of the embodiment of social life.


The Swearing Voice

Pain of the Tongue – Pain of the Mind

“It always amazes me…that in response to sudden or unexpected pain, people will often actually say “Ow!” or something like it rather than just crying out. It is as though the pain is not quite there until it can be given expression and thereby made actual enough to be dispelled.”

– Conner 2016

The Fire Noodle Challenge is a recent YouTube trend that became quite popular after tourists started discovering one of the spiciest instant noodles found in South Korean convenience stores. American, British and other foreigners not accustomed to spicy food challenge themselves with this rare, extreme level of spiciness. The testers mostly end up  recording themselves swearing (or some other sort of exclamation) from the burning pain.

Even though we often think of a cry of pain as a ‘raw’ reaction, we still find the natural reflex in the form of whole words – especially in swearing (except for children…hopefully). As a common linguistic feature in many societies, there are cultural similarities and specificities in swearing. Together, they hint at the embeddedness of social life in expressing the state of mind of the individual.

A Definition of Swearing

Ljung (2011) provides a linguistic definition of swearing with certain criteria:

  • Swearing takes the form of utterances containing taboo words that are non-literal.
  • Swearing is a formulaic and emotive language.

Non-literal Taboo Words

Taboos are essentially behaviours that ought to be avoided, and are referenced using taboo words. Taboos and their corresponding terms emerge from and ascribed by a history of social ‘rules’, which often possess hierarchical elements from religious and non-religious origins (Shakiba 2014).

For example, taboo words stemming from Christianity use the names of ‘higher’ and ‘lower’ divinities: “Bloody hell!” (abode of lower divinity) and “Oh my God!” (higher divinity). Vulgarity usually determines taboo words from a non-religious context, such as those in association with excrement and sexual intercourse (Ljung 2011). In the Buzzfeed video, a lot of people who were trying the spicy fire noodles said “oh my God!” or “oh shit!

A historical emphasis on social class created an association of how swearing as a ‘lower’ language is mainly spoken by speakers from ‘lower’ classes. Whilst this is widely recognised in various societies, there are also cross-cultural differences that indicate specific associations to class. A common phrase in English is to ‘swear like a soldier.’ In Swedish, they say ‘svära som en borstbindare,‘ which translates as ‘swear like a brushmaker’ (Ljung 2011).

Swearing is symptomatic. This means that the speaker uses swear words to indicate his or her own mental state, where the words are not to be interpreted as their literal meaning (Ljung 2011). So when one of the Buzzfeed staff said “yeah, that would fuck you up,” he did not literally mean that those noodles would have sex with you.

Formulaic and Emotive Language

People find that swear words lose their nuance or ‘punch’ when we replace them with their literal synonyms. If we substitute “are you fucking with me?” with “are you shagging with me?” it does not have the same offensive power (or even the same intended meaning) as the former form. This makes swearing a type of formulaic language as they have specific synonymies that are particular to them (Shakiba 2014).

As an emotive language, an ubiquitous swearing often has the main function of expressing anger, joy, surprise, or any other extreme level of emotion. Degrees of offensiveness can be conveyed in linguistic and non-linguistic ways (Shakiba 2014). Linguistically, “cunt” is considered to be more offensive than “bitch.” Non-linguistically, the same taboo term can sound more aggressive by projecting it louder in volume, or more sarcastic in tone. We can hear that each of the Buzzfeed staff exclaimed in pain in their own unique way – from as mild as “ooohhhh boy…” to the explicitness of “tasty in the front, and just like BAM! BAM! BAM! Punch you in your dick on the way out.

People who speak more than one language tend to view swearing in their first language as more satisfying and delivers a greater emotional punch. At the same time, a recent study shows that people who swear appear more trustworthy because of its honest nature (Edmonds 2017). It seems that swearing is productive in emotional release, for both pain and creating social relationships.

The Swearing Voice

The voice in swearing possesses a very interesting position. There is no restriction to the degree of intentionality underlying each time taboo words are used. From a sociolinguistic perspective, even the ‘raw’ voice is inevitably a construction of culture. From an anthropological perspective, swearing is an example of the embodiment (i.e. the voice) of social etiquette (i.e. taboos).

Whether you are a lover of spicy food who’s bored of your usual local curries, or just looking for an excuse to swear, you can buy some fire noodles here.


The Attractive Voice

Dating Profiles and Pick-up Lines

On a recent episode of the Graham Norton Show, Bryan Cranston explains how he directed videos of people’s dating profiles back in the 1980s. He pointed out that they had to do retakes whenever the subject displayed signs of nervousness or uptightness: tense monotone voice implying lack of confidence. More attractive recordings were made when the subjects did not know that the tape had been rolling. They were laughing naturally and seemed more relaxed.

In making parodic versions with Eddie Redmayne and Benedict Cumberbatch, you may have noticed that their intonations varied quite a bit (at 2:35). Whilst this may have been exaggerated in the art of performance, there is a basic attempt to produce an ‘attractive’ voice. Such an attempt can generally be found in the daily social lives of humans.

The Aesthetic Voice

“Susan Buck-Morss argues that “Aesthetics…is a form of cognition, achieved through taste, touch, hearing, seeing, smell – the whole corporeal sensorium. The terminae of all of these – nose, eyes, ears, mouth, some of the most sensitive areas of skin – are located at the surface of the body, the mediating boundary between inner and outer.”

– Neumark et al. 2010: 17

Vocal production, as an auditory mode of communication and self-expression, has an inherently aesthetic quality. We have tendencies to subconsciously form quick opinions on a person’s identity based on hearing just a snippet of his or her voice. In cases where we come across a voice that is distinctively ‘nice’ and attention-grabbing, we often find it difficult to pin-point why it is ‘nice’. We describe it using vague words such as clear, sexy, deep, rich, husky, and so on. The attractive voice is a mysterious voice. Even though the voice remains subjective in nature, we can explore some of the behavioural explanations of this mysterious phenomenon.

Sexual Attraction: Dating and Mating

In the parodies of the dating videos, the 3 celebrity guests seemed to have took their time in speaking their sentences and highlighted the deeper, darker tones of their voice. Studies show that women actually do tend to find deep voices more attractive in men, as it shows signs of confidence. When trying to sound sexy, men and women both slowed their speech that can create a more intense auditory experience for the listener. However, the level of conscious intentionality in such mechanical adjustments were very minimal (Hughes et al. 2014).

The voice can invite a profound understanding of person’s mental state with minimal conscious effort. Even when you’re not deliberately trying to display signs of romantic interest, others can interpret romantic interest from your voice. A study revealed that observers accurately guessed whether a sample of men and women were talking to a friend or a romantic partner, with hearing as little as a short phrase of “how are you?” These observers also noted that their voices sounded sexier, more pleasant and displayed more romantic interest when these heterosexual individuals were talking to their romantic partners rather than their same-sex friends (Farley et al. 2013).

From an anthropological perspective, all these aspects of vocal attraction demonstrate a sex-oriented perception of the aesthetic voice. Whilst such aspects can be explained by beneficial, behavioural and mechanical factors, most people can unconsciously manipulate their own vocal attractiveness, even without any specialist knowledge of such effects. As film actors who need to ensure good vocal delivery and appropriate portrayal of their characters, Eddie Redmayne and Benedict Cumberbatch are considered to be skilled in manipulating the level of attractiveness of their voices – especially given the large audiences they manage to attract.


