How can I efficiently parse HTML with Java?

IT Nursery

May 31, 2022

I do a lot of HTML parsing in my line of work. Up until now, I was using the HtmlUnit headless browser for parsing and browser automation.

Now, I want to separate both the tasks.

I want to use a light HTML parser because it takes much time in HtmlUnit to first load a page, then get the source and then parse it.

I want to know which HTML parser can parse HTML efficiently. I need

Speed
Ease to locate any HtmlElement by its “id” or “name” or “tag type”.

It would be ok for me if it doesn’t clean the dirty HTML code. I don’t need to clean any HTML source. I just need an easiest way to move across HtmlElements and harvest data from them.

3 Answers
3

3 Answers 3

Leave a Reply Cancel reply

3 Answers
3