Anomaly Detection for Grad Admissions
GROUP 2 :- CANMING CHEN
K VINOI MAHADEVA
▶ Universities go through a lot of hassle to recruit prospective applicants ▶ Massive amounts of raw data are available for this purpose ▶ This data needs cleaning consuming time and money ▶ Removing anomalous and redundant data from the data set let’s target a selective pool of students from this data ▶ 69.5% anomalous data removed from the initial data
▶ Our target audience????????? -GRADUATE STUDENTS.
Different Types of Anomalies
▶ Over 100 columns had more than 90% values missing in the
▶ Categorical variables containing numeric and non-numeric data
▶ Inconsistent and redundant data for the same parameters.
▶ “Academic load” having values such as F,FT, P,PT , assuming that F and FT stands for Full Time and P,PT stands for Part time there should be one value that should be consistent throughout the rows for this column. This will not only make the segregation of data easier for the university but will also ensure consistency in data
▶ “Country” was being populated in an inconsistent manner, ex for
some applicants from Australia this field had values such as “AU”, “Australia” etc.
- 15% of the data for gender had null values, 58% of the applicants are females and 26% of the applicants are males
- 46.6% of the total applicants had First Source as App, which may be an indication of the number of applications/inquiries on the universities mobile app.
- 8% of the suspects on the website came via Facebook and Google.
- 72% of the suspects for graduate program were interested in the on-campus learning.
- 85% of the suspects were not very engaged on the application platform and hence were tagged as Lurkers. Email id was present for 99.7% of the population tagged as lurkers. Targeting this segment may prove beneficial for the university.
- 59% of the students who have expressed interest in the universities graduate program are international students
❑ Recommendations on the Web-form: ▶ Use a drop down menu for data like country, state and generic data (gender etc.) ▶ In order to make sure that the GPA’s being captured are sacrosanct, the first dropdown should be for the GPA scale ▶ Having an algorithm that normalizes the GPA based on the GPA scale above will ensure data sanctity ❑ Passing the values for categorical variables in a list. For example, if the value of Academic load is set to either Full- Time or Part-Time. ❑ Any value other than these two should be red flagged and a trigger should be generated for all such variables which have values other than the pre decided values
Funnel Analysis for Graduate Applicants
Best way to recruit/market students with incomplete data
▶ There may be ways to reach out to students without having all their information
▶ Search for either phone number or email and must continue communicating
with customers on these
▶ Identifying top performing traffic sources for most of the suspects
▶ Capturing browser cookies for retargeting
▶ In certain cases turning to offline marketing for top cities will also help gain
traction among students in those cities
▶ Web Scraping