Thought Process: CrossBrowser UI Automation using Computer Vision Algorithm

Business firms in the areas of FMCG, Publications, Travel etc. run most of their business through their web sites.Thus branding through their portals and the end user comfort in using them is of utmost importance. They need to ensure that the visual consistency, alignment, positioning of various widgets, font face etc. across different browsers is maintained throughout.

In such scenarios, cross browser UI testing plays a vital role in ensuring that there is no business loss due to browser incompatibility. This is mostly done manually where it becomes very challenging and time consuming to test the appearance details of WebPages at a pixel level especially for scenarios like image shape calibration, zooming and dynamic hyperlink checks.

Since high amount of bandwidth is required, cross browser UI testing is usually done by a dedicated team of test professionals. As in other areas, automation can be one approach to enhance testability, however there is a dearth of tools capable of verifying the inconsistencies to the level that is required. For e.g., most of the existing tools take screenshots of the entire page making manual effort inevitable to pinpoint the exact error. We have not come across tools that directly help in identifying inconsistencies in font styles and character attributes, seamless streaming of video files, size and position of images. So are these tools really automating cross browser UI testing?

This paper outlines an approach for automating the validation of web application widgets display across different browsers. The solution primarily involves leveraging on various image-processing libraries for optical character Recognition (OCR), image manipulation, pixel attribute identification, mouse action and dynamic hyperlink navigation checks to name a few.

Let us say the requirement is to verify that a company’s tagline is displayed in a specific font color, type and size in the homepage across all supported browser versions. We first identify the location of the text, its font style, character size, and pixel color. To find the location of text we take screenshots of the homepage and use OCR libraries like tessnet2, JOCR etc. to get the array of region coordinates corresponding to all texts in the page. Now we can match the string of the company tagline and get the corresponding co-ordinates of its location. Next, to identify the font type, size, and color, we use computer vision libraries like OpenCV, ImageJ etc. Input parameters would be the tagline region coordinates we got earlier. The computer vision functions that we need to use would depend on whether we want to get the attributes at character or at word level.

Let us consider another example where we want to locate the position of all hyperlinks on a page. This becomes very difficult using functional UI automation tools like QTP or Selenium if the web pages are rendered using technologies they do not support.

We know that the hyperlink will be underlined and its font color would change on mouse over. In this case we first identify the region co –ordinates corresponding to all the texts in the web page using OCR libraries. Then we move the mouse pointer over the co-ordinates corresponding to each word and check for changes in mouse pointer image. If we detect a change in, then we verify that the necessary changes are taking place in the text attributes if required. In this way, we can maintain an array of location co-ordinates of all hyperlinks and perform further actions for validation.

Apart from UI validation, we can use this approach for functional testing as well in places where widely used functional test tools fail to identify the application widgets and navigate through them. We can use our own custom computer vision libraries as a plug-in to the frameworks developed over these tools, thereby improving testability of applications build using new or unsupported technologies.

As we know there is no single silver bullet for all challenges. Limitation in this approach include hit on performance due to pixel level operations like raster scanning of page regions , capturing progressive screenshots and video frame processing .Further research has to be done in this area to optimize the entire process.

Thought Process

Pages

Welcome to Thought Process

About Me

Thursday, 14 April 2011

CrossBrowser UI Automation using Computer Vision Algorithm

No comments:

Post a Comment