Parsing HTML using Python

I’m looking for an HTML Parser module for Python that can help me get the tags in the form of Python lists/dictionaries/objects. If I have a document of the form: <html> <head>Heading</head> <body attr1=’val1′> <div class=”container”> <div id=’class’>Something here</div> <div>Something else</div> </div> </body> </html> then it should give me a way to access the nested … Read more

Which HTML Parser is the best?

Self plug: I have just released a new Java HTML parser: jsoup. I mention it here because I think it will do what you are after. Its party trick is a CSS selector syntax to find elements, e.g.: String html = “<html><head><title>First parse</title></head>” + “<body><p>Parsed HTML into a doc.</p></body></html>”; Document doc = Jsoup.parse(html); Elements links = … Read more