adbar/trafilatura
PythonApache-2.0activepopular
Health
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Health Breakdown
Activity25
Community25
Maintenance13
Popularity25
#article-extractor#corpus-builder#corpus-tools#crawler#html-to-markdown#html2text#llm#news-aggregator#news-crawler#nlp#rag#readability#rss-feed#scraping#tei#text-cleaning#text-extraction#text-mining#text-preprocessing#web-scraping
Should you contribute to adbar/trafilatura?
adbar/trafilatura has a FoundDev health score of 88/100, which puts it in the active-and-maintained tier. The maintainer team is shipping recently, issues are being closed, and a PR you open this week has a realistic chance of being reviewed.
Last push was 1 days ago — that signals an actively maintained project. New issues are likely to get a maintainer response within days. The project is written primarily in Python, so prior Python experience will shorten ramp-up.
Licensed under Apache-2.0, a standard OSI-approved license — safe to contribute to under normal employer IP policies.
Community
PythonApache 2.0
activepopular
1d ago