Introduction
With India’s 2019 General Elections around the corner, I thought it’d be a good idea to analyse the election manifestos of its 2 biggest political parties, BJP and Congress. Let’s use text mining to understand what each party promises and prioritizes.
In this part 5, I’ll explore the Inclusion and Diversity discussions in both manifestos.
Analysis
Load libraries
rm(list = ls())
library(tidyverse)
library(pdftools)
library(tidylog)
library(hunspell)
library(tidytext)
library(ggplot2)
library(gridExtra)
library(scales)
library(reticulate)
library(widyr)
library(igraph)
library(ggraph)
theme_set(theme_light())
use_condaenv("stanford-nlp")
Read cleaned data
bjp_content <- read_csv("../data/indian_election_2019/bjp_manifesto_clean.csv")
congress_content <- read_csv("../data/indian_election_2019/congress_manifesto_clean.csv")
Inclusion and Diversity
This topic is covered congress’ manifesto from Pages 20 to 23 of the pdf and in that of bjp’s from pages 31 to 35.
bjp_content %>%
filter(between(page, 31, 35)) -> bjp_content
congress_content %>%
filter(between(page, 20, 23)) -> congress_content
Most Popular Words
plot_most_popular_words <- function(df,
min_count = 15,
stop_words_list = stop_words) {
df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words_list) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
filter(!is.na(word)) %>%
count(word, sort = TRUE) %>%
filter(str_length(word) > 1,
n > min_count) %>%
mutate(word = reorder(word, n)) %>%
ggplot( aes(x=word, y=n)) +
geom_segment( aes(x=word, xend=word, y=0, yend=n), color="skyblue", size=1) +
geom_point( color="blue", size=4, alpha=0.6) +
coord_flip() +
theme(panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none") -> p
return(p)
}
custom_stop_words <- bind_rows(tibble(word = c("india", "country", "bjp", "congress", "government"),
lexicon = rep("custom", 5)),
stop_words)
bjp_content %>%
plot_most_popular_words(min_count = 6,
stop_words_list = custom_stop_words) +
labs(x = "",
y = "Number of Occurences",
title = "Most popular words related to Inclusion & Diversity in BJP's Manifesto",
subtitle = "Words occurring more than 6 times",
caption = "Based on election 2019 manifesto from bjp.org") -> p_bjp
congress_content %>%
plot_most_popular_words(min_count = 10,
stop_words_list = custom_stop_words) +
labs(x = "",
y = "Number of Occurences",
title = "Most popular words related to Inclusion & Diversity in Congress' Manifesto",
subtitle = "Words occurring more than 10 times",
caption = "Based on election 2019 manifesto from inc.in") -> p_congress
grid.arrange(p_bjp, p_congress, ncol = 2, widths = c(10,10))
Basic Search Engine
Lets build a cosine-similarity based simple search engine (instead of the basic keyword-based search that comes with the pdf document), in order to make these documents more easily searchable and gain context using most related lines in the manifestos for a given query. Using python’s scikit-learn for this.
from sklearn.feature_extraction.text import TfidfVectorizer, ENGLISH_STOP_WORDS
from sklearn.metrics.pairwise import linear_kernel
stopwords = ENGLISH_STOP_WORDS
vectorizer_bjp = TfidfVectorizer(analyzer='word', stop_words=stopwords, max_df=0.3, min_df=2)
vector_train_bjp = vectorizer_bjp.fit_transform(r["bjp_content$text"])
vectorizer_congress = TfidfVectorizer(analyzer='word', stop_words=stopwords, max_df=0.3, min_df=2)
vector_train_congress = vectorizer_congress.fit_transform(r["congress_content$text"])
def get_related_lines(query, party="bjp"):
if (party == "bjp"):
vectorizer = vectorizer_bjp
vector_train = vector_train_bjp
else:
vectorizer = vectorizer_congress
vector_train = vector_train_congress
vector_query = vectorizer.transform([query])
cosine_sim = linear_kernel(vector_query, vector_train).flatten()
return cosine_sim.argsort()[:-10:-1]
get_related_lines <- py_to_r(py$get_related_lines)
Common Popular Words with both BJP & Congress
As we see from the plot above, one of the most popular words in both BJP and Congress’ manifesto is “women”. Lets see, what each of them have planned for women in our country. First, BJP.
bjp_content %>%
slice(get_related_lines("women", party = "bjp")) %>%
select(text, page, line)
## # A tibble: 8 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 e o to ensure access to credit and other resources, capacity buil… 31 17
## 2 of 'new india' is the 32 22
## 3 strengthen social security mechanism for widows of our ma yrs. 32 16
## 4 women empowerment 31 1
## 5 12 to create a positive atmosphere for women, promote gender just… 32 11
## 6 society and economy. 31 7
## 7 ensure improved health and social suppo system for these frontlin… 32 6
## 8 home ministry, and have made strict provisions for transferring t… 32 8
This is the excerpt that we find from page 32, as we dug up based on above results -
Women’s security will be given more priority. We have constituted the Women’s Security Division in the Home Ministry, and have made strict provisions for transferring the laws in order to prevent crimes against women, in particular in a time-bound investigation and trial for rape. In such cases, forensic facilities and fast track courts will be expanded to bring convicts to justice.
Now, Congress.
congress_content %>%
slice(get_related_lines("women", party = "congress")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 transport facilities to increase the participation ment… 21 24
## 2 06. sufficient night shelters will be built for migrant … 21 28
## 3 07. congress promises a comprehensive review of the 21 34
## 4 women workers. adequate number of safe and as a… 21 29
## 5 and empowerment. india’s only woman prime minister, smt. indira g… 21 6
## 6 38 … 20 50
## 7 between the forest rights act and the compen- 21 4
## 8 assemblies in the first session of the 17th lok sabha and in the … 20 47
## 9 05. we will repeal any provision of law that prohibits … 21 26
One of the excerpts from page 21 related to above results -
We will stipulate that every Special Economic Zone shall have working women’s hostels and safe transport facilities to increase the participation of women in the labour force.
Unique popular words with BJP & Congress
One of the popular words that seems curious from BJP’s manifesto is “families”.
bjp_content %>%
slice(get_related_lines("families", party = "bjp")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 and the tax bene ts to ensure more cash and greater purchasing po… 34 3
## 2 09 we have been successful in extending the food security cover t… 33 28
## 3 digit in the next ve years. 33 26
## 4 ensuring welfare of poor 33 24
## 5 subsidized prices. we will fu her widen this cover to provide sub… 33 30
## 6 against women, in pa icular in a time-bound investigation and tri… 32 9
## 7 will formulate a dedicated programme for the creation of work opp… 32 15
## 8 13 we are commi ed to ensure the welfare of widows of defence per… 32 14
## 9 institution’s curriculum and training modules of public o ces. 32 13
An excerpt from BJP’s manifesto about poor families as identified from above -
We have been successful in extending the food security cover to over 80 crore people from poor and lower-middle-income families who are receiving food grains (wheat/rice/coarse grains) at highly subsidized prices. We will further widen this cover to provide subsidized sugar (at Rs. 13 per kg per family per month) to these families in line with our motto ‘Sabka Saath-Sabka Vikas’.
Now, one of the popular words that seems curious from Congress’ manifesto is “tribes”.
congress_content %>%
slice(get_related_lines("tribes", party = "congress")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 42 … 22 66
## 2 tion into the nes through consultation with the touri… 22 32
## 3 enumeration of denotified and semi-nomadic noma… 23 10
## 4 compartmental reservation for denotified and 23 14
## 5 the constitution of india. nothing will be done or allowed to cha… 21 43
## 6 01. congress promises a special census and the … 23 9
## 7 06. sufficient night shelters will be built for migrant … 21 28
## 8 the constitution of india provides for reservation in employment … 21 51
## 9 social audit of policies and programmes for 23 4
Congress promises a Special Census and the enumeration of Denotified and Semi-Nomadic Tribes and the integration of the data in the decennial census.
With all the above analysis, we have developed some idea about the Inclusion & Diversity plans of the 2 parties. In the next post, I’ll do a similar analysis for National Security proposals by them.
Stay Tuned!
References
- Part 4 - Anti-Corruption and Good Governance
- Part 6 - National Security
- For all the parts go to Project Summary Page - India General Elections 2019 Analysis