Telugu films and gender inequality

Actor-actress age gaps, career and marriage, female-lead films, genres, birthplaces of heroes and heroines.
Data science
India

Introduction

The lead male actor (the “hero”) and the lead female actor (the “heroine”) are important roles in a film, and more so in Indian films. In mainstream Indian films, it has been noticed that the actor who plays the hero (I’ll interchangeably refer to the actor as lead male actor or hero, the context should make it clear) is often a lot older compared to the actor who plays the heroine, but the ages of the characters they play in the film are comparable.

Does this age gap exist across decades and in various regions of Indian cinema? What are the differences between the careers of heroes and heroines? How does marriage affect them? In this project, I’ll delve into such issues and understand some aspects of gender inequalities in Indian cinema.

Pilot: age gaps for major heroes

As a starting point, I had scraped Wikipedia and other public sources and prepared a database of films of a few heroes in mainstream Indian cinema, along with the ages of the hero and the heroine in each film. I also depict genres extracted from the film Wikipedia pages. Here are some of those.


For all the major heroes shown here, the age gaps between the hero and heroine increase with time, and are sometimes as large as 30 years. So how has this changed over the years? How does marriage affect the careers of heroes and heroines? I wanted to look at this more systematically and widely across time and space in India.

A deeper dive into Telugu cinema

I chose to take a closer look at Telugu cinema and investigate age gaps, careers and other such aspects. First, I extracted information about Telugu films and their cast from 1940 till 2023 June from Wikipedia pages. For each Telugu film, I made an initial list of film title, year it got released and the main cast. I extracted this information from the yearly film list pages on Wikipedia. From this main cast list for each film, I identified the “hero” and the “heroine”. This requires one to infer the gender of each cast member. The way I did this is by looking at their Wikipedia pages if available (or by performing an automated search on DuckDuckGo if Wikipedia page is unavailable), and counting the number of occurrences of the words actor, he, him, his as compared to actress, she, her.

For a similar analysis of Bollywood films, go here.

Identifying the hero and heroine of a film from such a main cast comes with some asssumptions and issues. The main assumption is make by default is that the first male cast member is the hero and the first female cast member is the heroine. However,

1.) Many films have multiple heroes and heroines.

In such cases, I only assign one hero and one heroine. Further the heroes and heroines can be “paired” in any combination. The cast order doesn’t inform us about which hero is paired with which heroine.

For example, సీతమ్మ వాకిట్లో సిరిమల్లె చెట్టు (Seethamma Vakitlo Sirimalle Chettu) from 2013 has cast is listed as Mahesh Babu, Venkatesh, Anjali, Samantha, Prakash Raj, Jayasudha. Our method would identify Mahesh Babu as the hero and Anjali as the heroine, while Mahesh Babu and Samantha are paired together.

2.) Some films don’t have a hero or a heroine.

The assumption that each film necessarily has a hero and a heroine itself doesn’t hold true for a few films. For example, ఈనాడు (Eenadu) from 2009 has Kamal Haasan and Venkatesh as the main cast, with no heroine credited.

3.) What should the relationship be between the hero and the heroine?

Does relationship necessarily have to be a romantic one? Or does importance to the film’s plot take prominence over a romantic pairing of the potential hero and the potential heroine?

Take రుద్రమదేవి (Rudhramadevi) from 2015 for example. Cast is listed as Anushka Shetty, Allu Arjun, Rana Daggubati. While our method would correctly identify Anushka Shetty as the heroine, Allu Arjun is identified as the hero while Rana Daggubati plays the character who romances Anushka Shetty.

Another recent example is Godfather (2022) whose main cast is Chiranjeevi, Nayanthara, Salman Khan, Satya Dev. While Chiranjeevi is the hero, the heroine Nayanthara is not the hero’s romantic interest. In fact, the hero doesn’t romance anyone.

My approach

The approach I took is the following. The first cast member is always the “lead” - it is mostly the hero, but sometimes the heroine. The other (heroine in cast the lead is the hero, otherwise the hero), is the next cast member of the opposite gender. In most cases, this covers both (a) the romantic hero-heroine pair, and (b) the non-romantic hero and heroine. Sometimes, the lead has a romantic interest in some other case member that is not the identified heroine/hero. I identified a few of these cases individually and switched the heroine/hero so that in the presence of a romantic interest, the heroine/hero is not the first cast member of the opposite gender but someone else. Romantic interest takes priority over listed cast order. However, I didn’t cover all of these films. These are only edge cases and do not dominate the bulk of Telugu films. In any case, I created two datasets: (a) High+ confidence dataset, in which the hero and the heroine are unambiguously the first and second cast members and (b) Low+ confidence dataset, in which there is some ambiguity in the hero and the heroine because of the cast order. The High+ confidence dataset will exclude multistarrer movies as identifying the hero (or the heroine) difficult. While the overall statistics change only very little between thye datasets, you can still switch between the datasets here.

Ages and age gaps over the years

Once the hero and heroine were identified, I extracted their birth years (along with their wedding years and place of birth if available) from their Wikipedia pages and from searches on DuckDuckGo and Google. An automated process that relies on DuckDuckGo search to extract this information is riddled with inaccuracies as (a) often there are many people who share similar names, or (b) the cast members are only credited with a short name/nickname that is ambiguous, or (c) the birth year on the internet is not reliable. Hence, I mainly relied on Wikipedia, but fell back to DuckDuckGo + manual curation for a handful of heroes and heroines who have starred in multiple films.

Using the birth years, I calculated the ages of the hero and the heroine for each movie.


Because of the issues I mentioned earlier, not all movies have both the hero’s age and the heroine’s age available. Here’s their availability over the years.

Hero grid: age-gaps for each hero

To make the visualization of this age-gap data easier, I made individual plots for each major hero/heroine showing the age-gap over their careers. Click on a hero to view his career age-gap trajectory:

Heroine grid: age-gaps for each heroine

Click on a heroine to view her career age-gap trajectory:

Debuts and exits

Careers and marriages

I only considered the year of first marriage in case an actor was married multiple times. Also, not knowing the marriage year doesn’t automatically mean that the actor is unmarried. So it is difficult to consider the careers of unmarried actors.

Female-lead films and genres

For each year, I also checked what fraction of films have the first cast member as a female, meaning the heroine is credited before the hero. I’ve taken this as a sign that the film is a female-lead one. I also extracted the genres for each film from Wikipedia and IMDB whenever available.

Where have heroes and heroines come from?

Actors as heroes/heroines vs as cast members