Reddit Relevant XKCD

May 5, 2017May 30, 2019 by Kevin

So I embarked on a machine learning quest as a senior project. With all these advances in AI, I wanted to have some fun myself :P. You can view it here.

I was inspired by Dan Zhang and Megan Ruthven’s Relevant XKCD finder. They used ExplainXKCD to populate their dataset. It’s pretty good but could it be better?

I wondered if Reddit’s comment corpus could help me get a better model with higher accuracy. Once and a while I’d see comments where users would helpfully post “Relevant XKCD” links in response to certain comments. With all that data out there, I thought it would be a good way to try out Machine Learning techniques.

I used Python and SK-learn library to create my multi-class text-classification model.

So I essentially wanted my model to classify a series of words and output a number, which is the XKCD id.
Here are my posts (That I may or may not write):

Retrieving Reddit data from Google BigQuery

Finding the Right Model (MultinomialNB and SGDClassifier – SKlearn)

Loading the Model (and Complaining about Memory Usage)

Final Thoughts

Code is on GitHub. :P

Want updates?

Categories

Four Context

Reddit Relevant XKCD

Leave a Reply Cancel reply

Want updates?

Tags

Categories

Reddit Relevant XKCD

Leave a Reply Cancel reply

Related Posts

Tips on what to do during a software engineering internship. (part 5 – during)

How to get an software engineering internship (part 4 – interviewing)