I use Poppler's command-line pdftohtml
to extract rich-text but if you need paragraph clean then the PDF got to be a tagged-PDF. If you need the (x,y) co-ordinate of the paragraph then you need to dig deeper into Poppler. There is also Apache PDFbox
Java library that can also be used. If you make an annotation in the beginning of the paragraph then you can pull out the annotation as an XML from the PDF where you will find the (x,y) co-ordinate of the annotation! Adobe puts a clever encryption into the PDF to make it undiscoverable, so it may not be easy (that's with all the legal hassles etc) to pull that out without Adobe tools.