Skip to content

Final Thoughts About RelevantXKCD project

Part of my RelevantXKCD project writeup

After I wrapped a simple Django server and posted it on Reddit in the subreddit /r/xkcd, I got some great feedback!

For the Model

As /u/drcopus helpfully enlightened me, I should’ve done data augmentation on the original training set. I could’ve taken these comments, and substituted synonyms and even split each sentence in a comment into different sets.

He also mentioned that I should’ve finetuned a Neural Network rather than create one from scratch. I thought a <1% was fishy, so now I know that it just needs to run more. (Which means that training would take a very long time)

Essentially, attempting to create a model just takes a lot of time, and it is going to take a lot of processing time before progress can be accurately gauged.


I think the model works pretty well, although I’ve noticed that the top result isn’t usually the “best” simply because how people tended to search for one word queries…

Also not all comics were equally represented. Luckily most of the comics (1773 of them in the training data) had at least 2 training examples. But having 20 or more was less than 50%. This biased the system since more training examples = more vocabulary = higher probability to see comic.

Here is a list of all comic IDs that had NO data (therefore would never been seen) — it isn’t part of the comic percentages above —

Some of these are really good, however, obviously these aren’t the popular ones.


Slightly Unrelated Stuff…

So over that week, I compiled the top queries:meta-chart

Everything else was quite narrow, so I decided against showing the queries. The queries are expected because the top 3 reddit comments said:

Try “xkcd”

I had a good laugh at the ‘dragon’ result

Try “desolate”

It also shows that Bobby Tables is a classic (sql).



<side> It’s funny that a week later that XKCD’s newest comic had to do with Machine Learning. Pretty much what I did lol. </side>резиновый уплотнитель для входной двери купитьбиол посуда купить в москведанильченко юрий брониславович прокурорryobi катушкиroute express trans siberianФильчаков прокурор харькова

Published inMachine LearningThoughts

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.