openai langchain beautifulsoup4 chromadb tiktoken pypdf gradio PyMuPDF gdown docx2txt sentence-transformers ibm-watson-machine-learning ibm-generative-ai "unstructured[all-docs]"