#TRIAL VERSION WINRUNNER UPDATE#
This is a brief update of what we have done :Īfter our intial data set which we felt was biased towards sites that have robots.txts, we decided to increase the data set.To do so, we got the RDF from DMOZ and classified URLs into different domains.In each domain, we pinged every site for existence of robots.txt upto a maximum of 50000 sites in each domain.įor those sites that had robots.txts, we crawled 2 levels completely using JoBo to get the size and usage statistics of a maximum of 1000 random websites which had robots.txt for each domain or whichever is maximum.įor all the sites that had robots.txt, we validated the same using the validation logic in the below website :Ī total of 30000 robots.txts have been validated using an automated testing tool called iMacros Browser.We would like to mention that we got an academic trial lisence for 30 days that allowed up to use the tool for such a huge number.Thanks to iOpus, ( for giving us the same which otherwise would have costed $500. Warning Types and their percentages Warning Type: Warning Types and their percentages Error Type: Use of Robots Exclusion Standard in Different Domains DomainĮrror and Warning Percentages in the robots.txt in different domains Domain Although the de-facto standard was there for about a decade still there seems to be no proper agreement for the correctness. About 20% of the robots.txt's we have crawled has errors in them. The most interesting observation is the amount of errors present in robots.txt.
![trial version winrunner trial version winrunner](https://images-na.ssl-images-amazon.com/images/I/71CcCHYOR5L.jpg)
![trial version winrunner trial version winrunner](https://www.softwaretestinghelp.com/wp-content/qa/uploads/2007/08/WinRunner-Automation-Tool-.png)
So in summery about 22% percent of the web uses Robots exclusion standard while 14% of the content is hidden.
![trial version winrunner trial version winrunner](https://image2.slideserve.com/4722657/slide4-n.jpg)
These are the results we have obtained from our crawling.