Selenium with Chromium and Java on FreeBSD
10 Feb 2020 - tsp
Last update 22 Mar 2020
9 mins
Update: In addition an implementation in Python has been added to show
a short draft on how one can get started with Selenium in Python as well. This
can be found at the end of the article.
What is this about?
This blog entry is a short description on how to get started using Selenium
with chromedriver on FreeBSD with a Java application. This can be used to
develop automatic test applications for web applications or simple bots that
scrape content from webpages or automate actions on the web using a full
browser capable of running JavaScript, running browser plugins, etc.
Note that this is just a short tutorial on how to setup your IDE and write
a first simple program that accesses the webpage content and executes click
on a single link identified by an XPath expression. Itās not a complete
introduction to Selenium or itās Java interface. If one wants to get a detailed
step by step tutorial on how to use Selenium to build a web application testing
one can for example refer to Test Automation using Selenium WebDriver with
Java: Step by Step Guide by Navneesh Garg (note:
Amazon affilate link; this pages author profits from qualified purchases).
Install required software
First one needs a working Chromium installation.
This is usually done via packages
or via ports
cd /usr/ports/www/chromium
make install clean
This automatically installs the chromedriver binary at /usr/local/bin/chromedriver
Now one only needs to fetch the Selenium Java libraries. They can be found
at the selenium webpage. Just fetch the
Selenium Java package (ZIP file) and save at a convenient location. Unzipping
the files yields:
- The source jar that will not be required
- The
client-combined-*.jar
file that should be added to your projects
classpath
- A
libs
folder containing various JARās. These should also be
added to the projects classpath
Adding to the Classpath when using Eclipse IDE
When using Eclipse IDE simply start a new Java
project, right click your project and select properties
. Select Java Build Path
and use the Add external JARs
function to add both the client-combined-*.jar
file (not the -source
version) and all JARs from the libs
folder
to your projects classpath. This will have an effect during build and also
while launching from the Eclipse IDE.

When distributing your applications you have to use the method mentioned later on,
reference them in the JARs manifest, install the JARs into a system wide known
location or (beware of licensing problems!) merge the JARs into a single one.
Adding to the Classpath on the CLI
In case youāre running from your IDE you can simply configure your
classpath either by setting the CLASSPATH
environment variable in your
shells init script or using env CLASSPATH=
on each command invocation (or
while launching a subshell). This might be done in a wrapper script if desired.
Do not forget to add the classpath for your own classes (JAR or directory tree)
to your classpath though.
For example, one might use the following invocation:
env CLASSPATH=.:~/selenium/client-combined-3.141.59.jar:~/selenium/byte-buddy-1.8.15.jar:... javac MyTestclass.java
Note that one has to list each and every dependency from the libs folder
in this case so specifying them on the commandline is rather inconvenient
The first application
Now for a simple application that will fetch the Slashdot
webpage, accept the cookie banner if present and fetches a list of stories
together with their links.
First we create a test file named like our test program (in this example
called TestProg
) containing our basic skeleton.
Note that the style applied in this example is not suited for a real
application. One should nearly never ever use catch Exception
for example
but implement proper exception handling.
import java.util.List;
import org.openqa.selenium.By;
import org.openqa.selenium.NoSuchElementException;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
public class TestProg {
public static void main(String[] args) {
try {
// Set path of chromedriver binary
System.setProperty("webdriver.chrome.driver", "/usr/local/bin/chromedriver");
// Create the driver
WebDriver driver = new ChromeDriver();
// Let the user see the final state for 10 seconds
Thread.sleep(10000);
driver.quit();
} catch(Exception e) {
e.printStackTrace();
}
return;
}
}
As one can see we have set the webdriver.chrome.driver
system property.
This is not exactly good style either - this should be set (if possible in
any way) from the external launcher script. As one can see this property
has to point to our chromedriver
binary. This has been installed
automatically together with our www/chromium
package. Then we
create the driver using new ChromeDriver()
. This creates the browser
instance which is remotely controlled by WebDriver. This should also
be indicated at your standard error output:
Starting ChromeDriver 78.0.3904.108 (4b26898a39ee037623a72fcfb77279fce0e7d648-refs/branch-heads/3904@{#889}) on port 47736
Only local connections are allowed.
Please protect ports used by ChromeDriver and related test frameworks to prevent access by malicious code.
Feb 10, 2020 10:03:26 PM org.openqa.selenium.remote.ProtocolHandshake createSession
INFO: Detected dialect: W3C
Now - before we can fetch some data - we have to accept the cookie banner
presented by Slashdot. To do that we first have to determine how we can locate
the button. Luckily thatās easy on Slashdot - we use the Inspect feature
of Chromium in an incognito tab (to start without any cookies or other session
information present):

Now we simply copy the XPath to the element

With the known XPath of the link to accept the conditions - in this case itās
luckily an link inside an unique identified element so the path expression is
really unique and simple ("//*[@id=\"cmpwelcomebtnyes\"]/a"
) - we can
simply locate the required element using findElement
with the By.xpath
method and raise a click()
event on the webpage:
try {
// Fetch a webpage. For this example we use Slashdot
driver.get("https://slashdot.org/");
// Locate the "Accept" button
WebElement bannerElem = driver.findElement(By.xpath("//*[@id=\"cmpwelcomebtnyes\"]/a"));
/*
Click the element (if it's not present the NoSuchElementException
would already have been thrown)
*/
bannerElem.click();
// Display a message and provide some time for the user to see the action
System.out.println("Clicked the cookie banner ...");
Thread.sleep(250);
} catch(NoSuchElementException e) {
System.out.println("Didn't have to click the cookie banner ...");
}
As one can see the findElement
function would raise an NoSuchElementException
in case the banner is not present. This already provides a (not so clean)
solution to detect the presence of the cookie banner.
Now to our main task - fetching the titles and links. For this we use the
method findElements
and supply a class name that weāve also determined
using the inspect method of chromium as an interactive user. This method
delivers an list of elements that are tagged with the given class name.
After that we can iterate through the elements, locate the link (a
) element
contained inside the story-title
element, fetch the title which is
simply the text contained inside the link as well as the href
attribute
and output them to the commandline:
List<WebElement> titles = driver.findElements(By.className("story-title"));
for(WebElement elem : titles) {
WebElement titleLink = elem.findElement(By.tagName("a"));
String strTitle = titleLink.getText();
String strHref = titleLink.getAttribute("href");
System.out.println(strTitle + " + " + strHref);
}
Now weāve fully created an scraper for slashdot headlines and their links.
A word of caution (when writing bots instead of tests)
If you intend to use selenium to create a bot beware that there are some
bot detection scripts that scan for modifications made by Selenium
to the browser (injected JavaScript, added properties inside the DOM, etc.).
There are ways to prevent this injection and detection by anti-bot scripts
but as soon as youāre blacklisted you might have trouble getting unlisted
depending on the service. Remember that Selenium is basically created for
testing webpages and supplying input that a real user would use.
Youāll encounter such Selenium detection scripts when accessing webpages
like your bankās online presence, payment portals and big merchant portals.
Be sure to check if they block your account before using your main
credentials (at least use some test credentials before being banned
with your main account or use some additional set of accounts also on the
day to day basis). Also beware that using automated bots might violate
terms of service so webservices have a right to block your accounts and
deny any further bussines with you ā¦
In any case - please donāt write a spambot. Thereās already enough spam on
the web. Noone likes that. There are of course many valid reasons to write
bots to scrape information from webpages that make lives for direct fetching
and processing hard because they do build their webpages using JavaScript without
any fallback to plain HTML - thatās worst webdesign practice in my opinion (and
normally I simply do not use such pages any more).
Full sourcecode of sample application
The full source is available as GitHub GIST
Update: How to do the same thing in Python
Because Iāve been asked by a student how to achive the same effect
with Python - thatās pretty easy. First one requires again the www/chromium
package and the Selenium Python libraries (installed via pip install selenium
).
Now one can use the selenium
package from webdriver
:
from selenium import webdriver
from time import sleep
driver = webdriver.Chrome()
driver.get("https://slashdot.org/")
Access to elements works similar as in Java using functions like
- find_element_by_xpath
- find_elements_by_class_name
- find_element_by_tag_name
an so on. Accessing attributes uses get_attribute
and access to inner HTML
content is done using the text
property.
One can assemble this into the following short program hosted on a GitHub GIST
This article is tagged: