This is an article about a new tool in our project: the library jsoup. I want to describe the problem jsoup solves for us and the arguments why jsoup is the right library to solve this problem.
Motivation
As in other applications, our users can input text via HTML editor. Often, they copy-paste content from several sources such as Microsoft Office. Hence, a lot of obscure, malformed and incomplete HTML with proprietary tags is thrown at our application.