Introduction
With India’s 2019 General Elections around the corner, I thought it’d be a good idea to analyse the election manifestos of its 2 biggest political parties, BJP and Congress. Let’s use text mining to understand what each party promises and prioritizes.
In this part 6, I’ll explore the National Security discussions in both manifestos.
Analysis
Load libraries
rm(list = ls())
library(tidyverse)
library(pdftools)
library(tidylog)
library(hunspell)
library(tidytext)
library(ggplot2)
library(gridExtra)
library(scales)
library(reticulate)
library(widyr)
library(igraph)
library(ggraph)
theme_set(theme_light())
use_condaenv("stanford-nlp")
Read cleaned data
bjp_content <- read_csv("../data/indian_election_2019/bjp_manifesto_clean.csv")
congress_content <- read_csv("../data/indian_election_2019/congress_manifesto_clean.csv")
National Security
This topic is covered congress’ manifesto from Pages 13 to 16 of the pdf and in that of bjp’s from pages 11 to 12 and 38 to 39.
bjp_content %>%
filter(between(page, 11, 12) | between(page, 38, 39)) -> bjp_content
congress_content %>%
filter(between(page, 13, 16)) -> congress_content
Most Popular Words
plot_most_popular_words <- function(df,
min_count = 15,
stop_words_list = stop_words) {
df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words_list) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
filter(!is.na(word)) %>%
count(word, sort = TRUE) %>%
filter(str_length(word) > 1,
n > min_count) %>%
mutate(word = reorder(word, n)) %>%
ggplot( aes(x=word, y=n)) +
geom_segment( aes(x=word, xend=word, y=0, yend=n), color="skyblue", size=1) +
geom_point( color="blue", size=4, alpha=0.6) +
coord_flip() +
theme(panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none") -> p
return(p)
}
custom_stop_words <- bind_rows(tibble(word = c("india", "country", "bjp", "congress", "government"),
lexicon = rep("custom", 5)),
stop_words)
bjp_content %>%
plot_most_popular_words(min_count = 6,
stop_words_list = custom_stop_words) +
labs(x = "",
y = "Number of Occurences",
title = "Most popular words related to National Security in BJP's Manifesto",
subtitle = "Words occurring more than 6 times",
caption = "Based on election 2019 manifesto from bjp.org") -> p_bjp
congress_content %>%
plot_most_popular_words(min_count = 10,
stop_words_list = custom_stop_words) +
labs(x = "",
y = "Number of Occurences",
title = "Most popular words related to National Security in Congress' Manifesto",
subtitle = "Words occurring more than 10 times",
caption = "Based on election 2019 manifesto from inc.in") -> p_congress
grid.arrange(p_bjp, p_congress, ncol = 2, widths = c(10,10))
Basic Search Engine
Lets build a cosine-similarity based simple search engine (instead of the basic keyword-based search that comes with the pdf document), in order to make these documents more easily searchable and gain context using most related lines in the manifestos for a given query. Using python’s scikit-learn for this.
from sklearn.feature_extraction.text import TfidfVectorizer, ENGLISH_STOP_WORDS
from sklearn.metrics.pairwise import linear_kernel
stopwords = ENGLISH_STOP_WORDS
vectorizer_bjp = TfidfVectorizer(analyzer='word', stop_words=stopwords, max_df=0.3, min_df=2)
vector_train_bjp = vectorizer_bjp.fit_transform(r["bjp_content$text"])
vectorizer_congress = TfidfVectorizer(analyzer='word', stop_words=stopwords, max_df=0.3, min_df=2)
vector_train_congress = vectorizer_congress.fit_transform(r["congress_content$text"])
def get_related_lines(query, party="bjp"):
if (party == "bjp"):
vectorizer = vectorizer_bjp
vector_train = vector_train_bjp
else:
vectorizer = vectorizer_congress
vector_train = vector_train_congress
vector_query = vectorizer.transform([query])
cosine_sim = linear_kernel(vector_query, vector_train).flatten()
return cosine_sim.argsort()[:-10:-1]
get_related_lines <- py_to_r(py$get_related_lines)
Common Popular Words with both BJP & Congress
As we see from the plot above, one of the most popular words in both BJP and Congress’ is “terrorism”. Lets see, what each of them have planned to combat terrorism. First, BJP.
bjp_content %>%
slice(get_related_lines("terrorism", party = "bjp")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 of nations against international terrorism’ as a voluntary multi-… 38 26
## 2 tolerance’ against terrorism and extremism and will continue to f… 11 7
## 3 security paradigm of india in the last ve years. looking ahead, w… 11 3
## 4 information and services for indians living abroad. 38 21
## 5 surgical strikes and the air strikes carried out recently. we wil… 11 6
## 6 deeper multilateral co-operation 38 28
## 7 and organizations on the global stage. to ensure the same, we wil… 38 25
## 8 04 we are commi ed to taking concrete steps on international foru… 38 23
## 9 - shri narendra modi 39 13
This is the excerpt that we find from page 38, as we dug up based on above results -
We are committed to taking concrete steps on international forums against countries and organizations supporting terrorism, and we will take all necessary measures to isolate such countries and organizations on the global stage. To ensure the same, we will work towards establishing a ‘Comity of Nations Against International Terrorism’ as a voluntary multi-lateral forum based on the principles of the draft Comprehensive Convention on International Terrorism.
Now, Congress.
congress_content %>%
slice(get_related_lines("terrorism", party = "congress")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 to detect and punish violators and instigators. 14 53
## 2 02. the most serious threats to internal security 14 50
## 3 streets with a sense of impunity. 14 68
## 4 a. we promise an uncompromising approach and 14 65
## 5 hubs. congress had also proposed to establish f… 14 39
## 6 citizen and for every visitor to india. … 14 34
## 7 sis and quick response. congress had put in place c. in… 14 37
## 8 of india and ensure the safety of our people. statu… 14 13
## 9 03. the concept of national security in the 21st century d… 14 14
One of the excerpts from page 14 related to above results -
The concept of national security in the 21st century has expanded beyond defence of the territory to include data security, cyber security, financial security, communication security and security of trade routes. Congress promises to evolve suitable policies to address each of these subjects.
Unique popular words with BJP & Congress
One of the popular words that seems curious from BJP’s manifesto is “police”.
bjp_content %>%
slice(get_related_lines("police", party = "bjp")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 06 we will provide assistance to the states to upgrade their poli… 11 30
## 2 nancial suppo for higher education, for housing and for sta ing a… 11 25
## 3 modernization of police forces’. we will encourage expedited poli… 11 31
## 4 modernization of police forces 11 26
## 5 challenges. 11 29
## 6 11 a er e ectively strengthening coastal security through impleme… 12 11
## 7 - shri narendra modi 39 13
## 8 10 we have completed building six integrated check-posts with ano… 12 5
## 9 border areas in the country’s development and progress. 12 4
An excerpt from BJP’s manifesto about police forces as identified from above -
We will provide assistance to the states to upgrade their police forces through the ‘Scheme for Modernization of Police Forces’. We will encourage expedited police reforms in the states so as to enable the State police forces to deal with new types of crimes like cyber crime and help them to be more sensitive to the citizens, especially the weak and vulnerable sections of the society.
Now, one of the popular words that seems curious from Congress’ manifesto is “border”.
congress_content %>%
slice(get_related_lines("border", party = "congress")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 26 … 14 75
## 2 bilities of the force and to improve the welfare of our jawans. 15 6
## 3 forces—bsf, ssb, itbp and assam rifles—and roads … 15 8
## 4 streets with a sense of impunity. 14 68
## 5 and bsf. every effort will be made to induct more 06. we… 15 17
## 6 post them on or close to the border to prevent india-… 15 9
## 7 between 2 border outposts will be reduced. the … 15 14
## 8 and living conditions for the forces. the distance … 15 13
## 9 02. we will construct modern, well-equipped inte- di… 15 11
We will accelerate the construction of border roads along all borders of India, especially, the India-China border. We will enhance the capacity of the Border Roads Organisation and create separate divisions to build roads along the India-China and the India-Myanmar borders.
With all the above analysis, we have developed some idea about the National Security goals of the 2 parties. In the next post, I’ll do a similar analysis for Education Healthcare and other miscellaneous proposals by them.
Stay Tuned!
References
- Part 5 - Inclusion and Diversity
- Part 7 - Education, Healthcare and Miscellaneous
- For all the parts go to Project Summary Page - India General Elections 2019 Analysis