ArticleChatbot

Sleeping

lfoppiano commited on Oct 25, 2023

Commit

fde76b0

•

1 Parent(s): 5b25803

add documentation

Files changed (2) hide show

README.md CHANGED Viewed

@@ -12,13 +12,15 @@ license: apache-2.0
 # DocumentIQA: Scientific Document Insight QA
 ## Introduction
 Question/Answering on scientific documents using LLMs (OpenAI, Mistral, ~~LLama2,~~ etc..).
 This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
-Differently to most of the project, we focus on scientific articles and we are using [Grobid](https://github.com/kermitt2/grobid) for text extraction instead of the raw PDF2Text converter (which is comparable with most of other solutions) allow to extract only full-text.
-**Work in progress**
 **Demos**:
  - (on HuggingFace spaces): https://lfoppiano-document-qa.hf.space/

 # DocumentIQA: Scientific Document Insight QA
+**Work in progress** :construction_worker:
 ## Introduction
 Question/Answering on scientific documents using LLMs (OpenAI, Mistral, ~~LLama2,~~ etc..).
 This application is the frontend for testing the RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS.
+Differently to most of the project, we focus on scientific articles. We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) that provide and cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
+**NER in LLM response**: The responses from the LLMs are post-processed to extract <span stype="color:yellow">physical quantities, measurements</span> and <span stype="color:blue">materials</span> mentions.
 **Demos**:
  - (on HuggingFace spaces): https://lfoppiano-document-qa.hf.space/

streamlit_app.py CHANGED Viewed

@@ -177,6 +177,7 @@ with st.sidebar:
     st.markdown(
         """After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
     if st.session_state['git_rev'] != "unknown":
         st.markdown("**Revision number**: [" + st.session_state[
         'git_rev'] + "](https://github.com/lfoppiano/document-qa/commit/" + st.session_state['git_rev'] + ")")
@@ -231,8 +232,8 @@ if st.session_state.loaded_embeddings and question and len(question) > 0 and st.
             # for entity in entities:
             #     entity
             decorated_text = decorate_text_with_annotations(text_response.strip(), entities)
-            decorated_text = decorated_text.replace('class="label material"', 'style="color:blue"')
-            decorated_text = re.sub(r'class="label[^"]+"', 'style="color:yellow"', decorated_text)
             st.markdown(decorated_text, unsafe_allow_html=True)
             text_response = decorated_text
         else:

     st.markdown(
         """After entering your API Key (Open AI or Huggingface). Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. Once the spinner stops, you can proceed to ask your questions.""")
+    st.markdown('**NER on LLM responses**: The responses from the LLMs are post-processed to extract <span style="color:orange">physical quantities, measurements</span> and <span style="color:green">materials</span> mentions.', unsafe_allow_html=True)
     if st.session_state['git_rev'] != "unknown":
         st.markdown("**Revision number**: [" + st.session_state[
         'git_rev'] + "](https://github.com/lfoppiano/document-qa/commit/" + st.session_state['git_rev'] + ")")
             # for entity in entities:
             #     entity
             decorated_text = decorate_text_with_annotations(text_response.strip(), entities)
+            decorated_text = decorated_text.replace('class="label material"', 'style="color:green"')
+            decorated_text = re.sub(r'class="label[^"]+"', 'style="color:orange"', decorated_text)
             st.markdown(decorated_text, unsafe_allow_html=True)
             text_response = decorated_text
         else: