Previously, on the python version of Grabber, I used a BFS crawler. Good to scan all the code (as long as the parsers are not that dumb). The problem with these crawlers is that it's totally inefficient: all the problems are not everywhere.
Starting with this assumption, I tried to start rating what is actually important and what are the evidence that a page may be important for a security testing point of view. So, the architecture of the crawler is simply based on a priority queue and the priority is for now based on obvious reasoning which may be wrong: The script I prefer testing, is the one that is in POST, where the action is in HTTPS (and so on for the rest...) which gives something like that:
priority <- 30
If Form Then
priority <- 10
If Method = Post Then
priority <- 5
else if Anchor Then
If Get Variables Then // To Understand: index.php?foo=plop, compared to index.php
priority <- 20
If HTTPS Communication for {Method action or Anchor URL} Then
priority /= 2
This is a fairly incomplete work and kinda dumb, but at least it's unbiased for a set of URL.


Last comments