Webscraping with Selenium [Ruby, Firefox, OSX]

This article requires knowledge of Ruby, the concept of webscraping, and HTML.

Make sure you have Ruby installed (use RVM if you don’t have it yet) — I’m using 2.3.1 right now.

If you want a reference sheet, I’m in the process of writing one :P. Link will be updated when I get around to do so since there is so little Ruby help out there.

 

Why Use Selenium?

There are many webscraping tools out there but one of the only reasons I ended up using Selenium is due to it’s ability to simulate javascript. Other webscraping tools don’t actually create the browser interface so it can only scrape HTML elements and follow direct links, not fancy javascript actions. But because Selenium simulates a browser, Selenium is slower than Mechanize or Nokogiri. So Selenium might not be the best solution to everything. Use it only when webpages has complicated javascript in it.

One nice thing due to the created browser interface is that you can actually see the elements on the webpage, which makes debugging a lot easier.

Basics

So in this tutorial we’ll be using Firefox (although I use PhantomJS afterwards since it’s a lot faster since it’s headless — you can google what that means :P).

In order to do so, we need to install a gem  gem install selenium-webdriver .

 

You can test the following by either copying this into a ruby file or running  irb  (interactive ruby terminal) on your browser and copying and pasting the code below.

If you get any errors, make sure your gems, Firefox, and Ruby versions are up to date.

So the execute script allows you to run your own javascript code, the .click function will virtually click the element, and the .text function prints out the text of the element.

 

Helpful reference websites:

https://github.com/SeleniumHQ/selenium/wiki/Ruby-Bindings

http://www.rubydoc.info/gems/selenium-webdriver/0.0.28/Selenium/WebDriver/Driver

 фишинг юаsochi boutique officielleтопкаргомакияж дневнойсправка форма 086уручки винтаж

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.