requests gradio==4.44.1 beautifulsoup4 urllib3 trafilatura huggingface_hub sentence-transformers torch python-dotenv lxml lxml_html_clean tenacity scrapy newspaper3k PyPDF2 html2text groq faiss-cpu mistralai rank_bm25 spacy textblob