Is there a good way to remove HTML from a Java string? A simple regex like
replaceAll("\\<.*?>", "")
will work, but some things like &
won’t be converted correctly and non-HTML between the two angle brackets will be removed (i.e. the .*?
in the regex will disappear).