Introduction
With India’s 2019 General Elections around the corner, I thought it’d be a good idea to analyse the election manifestos of its 2 biggest political parties, BJP and Congress. Let’s use text mining to understand what each party promises and prioritizes.
In this part 3, I’ll explore the Employment and opportunities discussions in both manifestos.
Analysis
Load libraries
rm(list = ls())
library(tidyverse)
library(pdftools)
library(tidylog)
library(hunspell)
library(tidytext)
library(ggplot2)
library(gridExtra)
library(scales)
library(reticulate)
library(widyr)
library(igraph)
library(ggraph)
theme_set(theme_light())
use_condaenv("stanford-nlp")
Read cleaned data
bjp_content <- read_csv("../data/indian_election_2019/bjp_manifesto_clean.csv")
congress_content <- read_csv("../data/indian_election_2019/congress_manifesto_clean.csv")
Employment and Opportunities
This topic is covered congress’ manifesto from Pages 6 to 8 of the pdf and in that of bjp’s from pages 20 to 22 and 27 to 28.
bjp_content %>%
filter(between(page, 20, 22) | between(page, 27, 28)) -> bjp_content
congress_content %>%
filter(between(page, 6, 8)) -> congress_content
Most Popular Words
plot_most_popular_words <- function(df,
min_count = 15,
stop_words_list = stop_words) {
df %>%
unnest_tokens(word, text) %>%
anti_join(stop_words_list) %>%
mutate(word = str_extract(word, "[a-z']+")) %>%
filter(!is.na(word)) %>%
count(word, sort = TRUE) %>%
filter(str_length(word) > 1,
n > min_count) %>%
mutate(word = reorder(word, n)) %>%
ggplot( aes(x=word, y=n)) +
geom_segment( aes(x=word, xend=word, y=0, yend=n), color="skyblue", size=1) +
geom_point( color="blue", size=4, alpha=0.6) +
coord_flip() +
theme(panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none") -> p
return(p)
}
custom_stop_words <- bind_rows(tibble(word = c("india", "country", "bjp", "congress", "government"),
lexicon = rep("custom", 5)),
stop_words)
bjp_content %>%
plot_most_popular_words(min_count = 6,
stop_words_list = custom_stop_words) +
labs(x = "",
y = "Number of Occurences",
title = "Most popular words related to Employment and Opportunities in BJP's Manifesto",
subtitle = "Words occurring more than 6 times",
caption = "Based on election 2019 manifesto from bjp.org") -> p_bjp
congress_content %>%
plot_most_popular_words(min_count = 7,
stop_words_list = custom_stop_words) +
labs(x = "",
y = "Number of Occurences",
title = "Most popular words related to Employment and Opportunities in Congress' Manifesto",
subtitle = "Words occurring more than 7 times",
caption = "Based on election 2019 manifesto from inc.in") -> p_congress
grid.arrange(p_bjp, p_congress, ncol = 2, widths = c(10,10))
Basic Search Engine
Lets build a cosine-similarity based simple search engine (instead of the basic keyword-based search that comes with the pdf document), in order to make these documents more easily searchable and gain context using most related lines in the manifestos for a given query. Using python’s scikit-learn for this.
from sklearn.feature_extraction.text import TfidfVectorizer, ENGLISH_STOP_WORDS
from sklearn.metrics.pairwise import linear_kernel
stopwords = ENGLISH_STOP_WORDS
vectorizer_bjp = TfidfVectorizer(analyzer='word', stop_words=stopwords, max_df=0.3, min_df=2)
vector_train_bjp = vectorizer_bjp.fit_transform(r["bjp_content$text"])
vectorizer_congress = TfidfVectorizer(analyzer='word', stop_words=stopwords, max_df=0.3, min_df=2)
vector_train_congress = vectorizer_congress.fit_transform(r["congress_content$text"])
def get_related_lines(query, party="bjp"):
if (party == "bjp"):
vectorizer = vectorizer_bjp
vector_train = vector_train_bjp
else:
vectorizer = vectorizer_congress
vector_train = vector_train_congress
vector_query = vectorizer.transform([query])
cosine_sim = linear_kernel(vector_query, vector_train).flatten()
return cosine_sim.argsort()[:-10:-1]
get_related_lines <- py_to_r(py$get_related_lines)
Common Popular Words with both BJP & Congress
As we see from the plot above, one of the most popular words in both BJP and Congress’ manifesto for Employment and Opportunities is “electricity”. Lets see, what each of them have planned for that. First, BJP.
bjp_content %>%
slice(get_related_lines("electricity", party = "bjp")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 promise in a record time. we will fu her work towards completing … 22 19
## 2 ÿ ensuring a right mix of energy which leads towards a cleaner … 22 24
## 3 a ained the goal. all remaining villages have been electri ed. si… 22 17
## 4 energy 22 15
## 5 transmission lines and in pu ing up the nationwide transmission g… 22 21
## 6 ÿ supplying quality electricity to all consumers 22 25
## 7 achievements, india can now claim that quantum of or access to el… 22 22
## 8 speed of constructing rural roads has doubled and 90% of rural ro… 20 7
## 9 approach the issue of water management holistically and ensure be… 21 4
This is the excerpt that we find from page 22, as we dug up based on above results -
Now, we will work towards:
- Ensuring a right mix of energy which leads towards a cleaner environment.
- Supplying quality electricity to all consumers.
- Making the state electricity entities financially sound and administratively more efficient.
Now, Congress.
congress_content %>%
slice(get_related_lines("electricity", party = "congress")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 implement the policy on spectrum and on ex- ele… 8 29
## 2 congress promises to enhance infrastructure in rural areas and im… 8 14
## 3 council and a separate administrative structure tra… 8 53
## 4 03. road construction and railways can be built using 06. … 8 24
## 5 qualified teachers, doctors, nurses, paramedics, and s… 7 41
## 6 congress will request state governments to fill all and e… 7 30
## 7 vacancies, estimated at 20 lakh, in the 2 sectors ucts,… 7 31
## 8 and in local bodies. 11. we pr… 7 32
## 9 04. we will work with state governments to create a… 7 33
One of the full excerpts from page 8 related to above results -
Congress promises to enhance availability of, and access to, electricity in rural areas by encour- aging investment in off-grid renewable power generation with ownership and revenues vesting in local bodies. Every village and every home will be electrified in the true sense. In the long term, we aim to substitute LPG used in homes by electricity and solar energy.
Unique popular words with BJP & Congress
One of the popular words that seems curious from BJP’s manifesto is “youth”. Let’s see what BJP has planned for the youth employment opportunities.
bjp_content %>%
slice(get_related_lines("youth", party = "bjp")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 a country with such a major 28 13
## 2 youth in governance 27 20
## 3 ₹20,000 crore. 27 19
## 4 hospitals, lakes, public gardens etc. and ensure their maintenanc… 27 24
## 5 the following steps : 27 22
## 6 protect them from the harmful e ects of substance abuse and addic… 27 27
## 7 more fully in building new india. 27 3
## 8 take advantage of the oppo unities available in domestic and fore… 27 8
## 9 greater civic engagement of the youth. 27 25
An excerpt from BJP’s manifesto about youth employment opportunities as identified from above -
We will incentivise and reward self-organized groups of youth who adopt social assets like schools, hospitals, lakes, public gardens etc. and ensure their maintenance and cleanliness to encourage greater civic engagement of the youth.
Now, one of the popular words that seems curious from Congress’ manifesto is “tourism”.
congress_content %>%
slice(get_related_lines("tourism", party = "congress")) %>%
select(text, page, line)
## # A tibble: 9 x 3
## text page line
## <chr> <dbl> <dbl>
## 1 and appoint a second asha worker in all villages 16. touri… 7 55
## 2 08. we will trigger rapid growth of the manufac- t… 7 57
## 3 provision of world-class infrastructure in industrial offer… 7 59
## 4 07. para-state workers such as anganwadi workers, t… 7 45
## 5 in addition, we will expand the asha programme and i… 7 54
## 6 body to ensure the effective delivery of govern- of 3 … 7 36
## 7 06. congress pledges to create lakhs of new jobs for 12. c… 7 40
## 8 and government posts will be abolished. until… 7 39
## 9 05. application fees for government examinations b… 7 38
Tourism creates jobs. Congress promises an ade- quately capitalised Tourism Development Bank to provide low-cost, long-term funds for invest- ment in tourism-related businesses. We will also offer lower rates of corporate and personal income tax on tourism-related business income.
A surprisingly large amount of mention of the word “business”/“businesses” in this section of Congress’ manifesto.
With all the above analysis, we have developed some idea about the Employment and Opportunities plans of the 2 parties. In the next post, I’ll do a similar analysis for Anti-Corruption and Good Governance proposals by them.
Stay Tuned!
References
- Part 2 - Economic Growth
- Part 4 - Anti-Corruption and Good Governance
- For all the parts go to Project Summary Page - India General Elections 2019 Analysis