← Back to Discover
adbar

adbar/trafilatura

PythonApache-2.0activepopular
88Health

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Stars6.1k
Forks381
Open Issues95
Contributors381
Last Push1d ago

Health Breakdown

Activity
25
Community
25
Maintenance
13
Popularity
25
#article-extractor#corpus-builder#corpus-tools#crawler#html-to-markdown#html2text#llm#news-aggregator#news-crawler#nlp#rag#readability#rss-feed#scraping#tei#text-cleaning#text-extraction#text-mining#text-preprocessing#web-scraping
View on GitHub ↗Issues (95) ↗Pull Requests ↗

Should you contribute to adbar/trafilatura?

adbar/trafilatura has a FoundDev health score of 88/100, which puts it in the active-and-maintained tier. The maintainer team is shipping recently, issues are being closed, and a PR you open this week has a realistic chance of being reviewed.

Last push was 1 days ago — that signals an actively maintained project. New issues are likely to get a maintainer response within days. The project is written primarily in Python, so prior Python experience will shorten ramp-up.

Licensed under Apache-2.0, a standard OSI-approved license — safe to contribute to under normal employer IP policies.

Community

adbar
adbar/trafilatura
PythonApache 2.0
88

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

activepopular
6.1k381 contributors95 issues
1d ago

More Python repos

earthians
earthians/marley
Open Source, Enterprise and Modern Health Information System
50494
YoungCan-Wang
YoungCan-Wang/WyckoffTradingAgent
Open-source Wyckoff trading agent and AI stock screener for volume-price analysis, A-share screening, CLI workflows, and MCP tools.灵感来自秋生trader @Hoyooyoo
49292
make-all
make-all/tuya-local
Local support for Tuya devices in Home Assistant
3.0k92