Few weeks ago, I spent some time creating a specialized visualization of the HTML5 tokenizer spec. As someone who tends to spend too much time with program analysis folks, the one thing that bothers me about the HTML5 is thatthere is no grammar. Indeed, the only specification is this tokenizer.
Not only this tokenizer is a f*-long-page description, but it's also not easy to navigate. I then decided to generate a simple grammar-like visualization of this tokenizer. You can find the viz here: HTML 5 Grammar Visualization and the blog talking about it here: Blog at SRL.
Right now, I'm thinking to improve the script that extracts the information (available here) so that I can have a better capture of the tokenizer transitions, and especially the HTML entity parser.
What do you think? Any thing you'd like to see in this visualization?