Open source repositories tagged with #page-xml, ranked by health score.
Read and extract text and other content from PDFs in C# (port of PDFBox)