Webminer 2.0 overview

Webminer is a software implementing a Web Robot. A Web Robot simulates a user using a web browser following links, filling up web forms and copying information from the web page into a spreadsheet or database.

Webminer is a software library for extracting information from targeted semi-structured web sites. Some aspects to take into account when extracting information from targeted websites include navigating through multiple pages and following links, handling common patterns, automatic form filling, user authentication, getting a complete index, handling cookies, handling ajax, client-side javascript execution, other formats than HTML (e.g. PDF, Flash), tracking data changes over time (e.g. new job postings), handling protected websites that use data obfuscators and crawler detection, exporting the data (e.g. by mapping the data into a simple table or into an advanced relational database) and guaranteeing the correctness of the extracted results (e.g. detecting changes in the web layout).

Webminer handles all these aspects in a fast and robust way. More importantly, the cost of maintaining such a solution is a fraction of the alternative solutions in the market. This makes Webminer a viable solution for durably integrating web pages and web applications in order to build mashups. It can also be used for monitoring competitors and market trends.

The underlying magic for such a low maintenance cost comes from combining the best of two worlds: data-flow principles from the academic world and object-oriented programming extensively used in industry. We merge these two paradigms in a software library, taking special emphasis in (1) functionality and extensibility by defining a set of basic operators supporting the web automation aspects mentioned above and new unforeseen ones and (2) robustness by monitoring and ensuring a correct execution of the process.

Deploying the extraction and integration of target Web pages and Web applications is done quickly in a robust way using Webminer. It consists of a one-time implementation for the target Web site, and a periodic monitoring and maintenance task. This process requires a software developer trained for this specific purpose. Thus, Webminer will be used by our team as a service to customers and by our system integrators.



