Sherry Duong

Logo

Curiously exploring the world with Python and SQL.

Email: sherryduong@gmail.com
LinkedIn: sherry-duong
Github: sherryduong93
View My Resume

Welcome to your anime match maker.

image

Contents

Motivations and Goals

Since shelter-in-place was enacted, more people have been staying home looking for more ways to pass the time. Like many, I found myself wanting to escape to a world of fantasy, and found that anime was the best way to do this.
However, when I finished one anime, I was surpised how difficult it was to find a similar anime to the one I enjoyed. It required searching on Google and going through various forums to find suggestions that seemed aligned to my tastes.
In order to address this problem, my goal will be to create:

To evaluate my efforts:

The Data:


All data is from Kaggle datasets, scraped from MyAnimeList.net. Last updated 2017 & 2018.

  • anime_df: 12,294 animes with name, genre, type, number of episodes, avg_rating, and members
  • rating_df: 7M reviews of 11,200 animes from 73,515 users
  • anime_meta: 14,478 animes with additional features: English title, dates aired, duration of anime, rating (PG,G,R, etc.), producer, studio, opening & ending theme songs
  • users_meta: 302,673 unique users with number of episodes watched, along with gender, birthdate, location, membership_date

Data Cleaning

Combining dataframes to get one large dataframe with all metadata for each anime

Exploding the Genre, Producer, & Studio columns to see trends

EDA

Comparing the average rating to the weighted ratings The code for this can be found in Notebooks/EDA.ipynb.

Ratings across different features : -Not a big difference between different sources image -Comparing different genres, there are some genres that do much better/worse than majority. image -Comparing the top 20 studios and producers, there are clear studios that are more highly rated by users.

Baseline Model:

Use the average rating of the training data to predict user ratings of the test data

Popularity Based Recommender System:

Content Based Recommender System:

Anime_id Keyword

Baseline Content Based Recommender:

Content Based Recommender Iteration 2:

Content Based Recommender Iteration 3

Clusters of Producers:

0, Bandai Visual, Pink Pineapple, Lantis, Sanrio, Fuji TV
1, Unknown, Bandai Visual, Aniplex, NHK, TV Tokyo
2, TV Tokyo, Tokyo Movie Shinsha, Sanrio, Sotsu, Milky Animation Label
3, NHK, Sanrio, Tokyo Movie Shinsha, Fuji TV, Milky Animation Label
4, Aniplex, Tokyo Movie Shinsha, Sanrio, Fuji TV, Milky Animation Label
Clusters of Studios:

0, Sunrise, Madhouse, Production I.G, Studio Pierrot, TMS Entertainment
1, Unknown, Sunrise, Madhouse, J.C.Staff, Studio Pierrot
2, Studio Deen, Toei Animation, Sunrise, OLM, Xebec
3, J.C.Staff, Toei Animation, Sunrise, OLM, Xebec
4, Toei Animation, Unknown, Nippon Animation, OLM, Tatsunoko Production

Simple Collaborative Filtering

Rating Data Statistics:
On average, each user provides 90 ratings, median number of ratings given per user is 45
On average, each anime has 638 ratings, median number of ratings provided per anime is 57 image
For a simple collaborative filter recommenders, I want to recommend the most popular movies from our most active users. I will be removing all users with less than 300 ratings, and all animes with less than 2500 ratings.
This leaves us with 4326 users, 694 anime, and 1M reviews.
The model functions for below KNN/SVD are stored in src/Popular_CollabFilt.py.

KNN Collaborative Filter

Explored simple KNN & SVD based collaborative filter models, imputing the NaN’s with zeros, average per user, and average per anime. The exploration of this process can be viewed in Notebooks/Simple_CF.ipynb.
Example from KNN: Fill in NaN’s with average anime rating: Recommendations for 120 [‘Fruits Basket’]:

  • 1: [‘Ouran Koukou Host Club’], with distance of 0.373
  • 2: [‘Vampire Knight’], with distance of 0.462
  • 3: [‘07-Ghost’], with distance of 0.480
  • 4: [‘Lovely★Complex’], with distance of 0.484
  • 5: [‘Special A’], with distance of 0.485
  • 6: [‘Vampire Knight Guilty’], with distance of 0.490
  • 7: [‘Kamisama Hajimemashita’], with distance of 0.496
  • 8: [‘Cardcaptor Sakura’], with distance of 0.498
  • 9: [‘Howl No Ugoku Shiro’], with distance of 0.498
  • 10: [‘D.N.Angel’], with distance of 0.499

    Simple SVD (imputing the NaNs with average rating per anime) Latent Features:

  • Feature 0: Action fantasy anime with war themes, Military Genre
  • Feature 1: Action and Sci-fi, supernatural
  • Feature 2: Not clear, some comedies, romance, action, video game military
  • Feature 3: Not clear - Action horror sci-fi, A-1 studio
  • Feature 4: Military, Action adventure, Sunrise and Bones studio
  • Feature 5: Naruto, Bleach and Dragonball like movies
  • Feature 6: High school with some random action
  • Feature 7: Slice of life or romance, comedy, school
  • Feature 8: Unclear - mix of everything.
  • Feature 9: Supernatural and psychological</pre>
  • Result: Upon spot-checking a recommendation, the results did not perform as well as KNN. SVD is having trouble recommending the correct genre and is recommending action animes for every attempt. This is similar to what we saw during latent feature exploration with the action genre in nearly every latent feature.

Ultimately did not proceed with these simple options due to high computational costs, these models are also biased as they were created only with the most popular anime, and the most active users.

Model Based Collaborative Filtering with Spark ALS

Results

Final tuned ALS model had 15 latent features

Spot-Check Results:
Upon spot-checking a few familiar anime, the recommendations from other users consist of some anime that is not so well known. When I searched a description of that anime, it seemed to match very well with the anime searched. I will definitely be testing out some of these recommendations to find my next anime!

Flask App: Your Anime Match Maker!


App Demo: Link To Your Anime Match Maker
image


Spot-Checking Some Results - Maid Sama:
MyAnimeList Recommendations: image
Your Anime Match Maker Recommendations: image

Conclusion, Caveats and Next Steps

Next Steps

Want To Add Feedback?

Data Sources:

Anime & user metadata from : https://www.kaggle.com/azathoth42/myanimelist
Anime and rating data from: https://www.kaggle.com/CooperUnion/anime-recommendations-database