WEB SCRAPPER

The application implements a simple web interface where user can schedule jobs to run at the specified time for scrapping websites data. You can create/pause/enable/remove jobs. Each scheduled job retrieves data from a particular website. All data is stored into a database. Web scrapping application takes input parameters (which vary for different websites) and scraps data filtered according to the input. Optionally we use TOR and/or proxy chains in order to secure scrapping server and hide it from target websites (administrators, sniffers, etc). We can customize the app to fit any requirements as well as support any site.

DESCRIPTION

The result of work of this web application is a JSON or an XML. User can choose which file format he wants to get. Resulting JSON/XML file can be downloaded for each job separately with data of this particular job or totally for all jobs with all data from the database. Duplicates are excluded from the JSON/XML by removing them on the stage of retrieving data from the database with the help of FULLTEXT search algorithms using dictionaries.

TOOLS & TECHNOLOGY

Express.js, Sequelize ORM, AngularJS, JavaScript, jQuery, CSS3, HTML5, Bootstrap, MVC, TOR, Proxy Chains

TAGS

html5, css3, javascript, expressjs, angularjs, mvc, sequelize, tor, proxy chains