Parsing Shorthand Dictionary with RMagick and RTesseract [Ruby]

Ruby Wrappers: RMagick, RTesseract (image manipulation and image recognition respectively)

You can Google how to install those gems.

 

Here was an example page of the dictionary. The shorthand translation was pretty much adjacent to the English word.

Example Gregg Dictionary Page

Here is what each # does.

  1. Loads the image
  2. This is where all the words are recognized
  3. Check for non important words that aren’t actually dictionary definitions
  4. Check all the words against a dictionary. This makes sure that the RTesseract didn’t mistaken a shorthand word for an English word.
  5. Then I grab the pixels to the left of the detected English word and save it to an jpeg image

The problem I encountered then was that some images had overlapping shorthand words.abandon (gregg

Like the word “abandon.”

One naive solution I used to solve this problem was essentially trim the tops and bottoms of the image. If there is a complete row of pixel that was white in between the center and the middle of the image, I’d fill it up completely with white. Some images (like the word “abandon”) worked perfectly, but there were other images that still had problems.

Try the site here.как выехать в германию из украиныкупить тур выходного дняАлександр Фильчаковcarp expertрецепты с блендером фотокупить беларусские двери в москве

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.