Machine Learning, Programming, Python

Reddit Relevant XKCD

So I embarked on a machine learning quest as a senior project. With all these advances in AI, I wanted to have some fun myself :P. You can view it here.

I was inspired by Dan Zhang and Megan Ruthven’s Relevant XKCD finder. They used ExplainXKCD to populate their dataset. It’s pretty good but could it be better?

I wondered if Reddit’s comment corpus could help me get a better model with higher accuracy. Once and a while I’d see comments where users would helpfully post “Relevant XKCD” links in response to certain comments. With all that data out there, I thought it would be a good way to try out Machine Learning techniques.

I used Python and SK-learn library to create my multi-class text-classification model.

So I essentially wanted my model to classify a series of words and output a number, which is the XKCD id.
Here are my posts (That I may or may not write):

 

Retrieving Reddit data from Google BigQuery

Finding the Right Model (MultinomialNB and SGDClassifier – SKlearn)

Loading the Model (and Complaining about Memory Usage)

Final Thoughts

 

Code is on GitHub. :P

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.