# Similarity Score
This Notebook imports the questions and answers from the QuestionGeneration.ipynb output and scores the similarity

Heavy inspiration taken from:

https://github.com/karndeepsingh/sentence_similarity/blob/main/Finding_Similar_Sentence.ipynb

## Import packages

In [64]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from sentence_transformers import SentenceTransformer

## Load the model
Uncomment to store the model locally for easy retrieval, but delete the model before uploading to GitHub as too large

In [65]:
model = SentenceTransformer('nli-distilroberta-base-v2')
# model.save("./Model/model")
# model = SentenceTransformer("./Model/model")

## Load in the questions and responses

In [66]:
Data = pd.read_csv("./Results/Compare.csv", index_col=0)
questions = Data.index.values
company = Data['Company'].values
gold = Data['Meta'].values
sentences = np.concatenate((company, gold))

## Create the sentence embeddings and obtain scores

In [67]:
sentence_embeddings = model.encode(sentences)
similarity_score = []
for i in range(len(company)):
    similarity_score.append(cosine_similarity(
        [sentence_embeddings[i]],
        [sentence_embeddings[len(company) + i]]
    ).flatten()[0])
Similarity = pd.DataFrame({'Score': similarity_score}, index=questions)

## Print a few of the scores

In [70]:
print(Similarity.head())

                                               Score
What is the file name of the document?      0.858884
When was the document last modified?        0.565336
What is the file path of the document?      0.755018
When was the document last accessed?        0.553015
What is the creation date of the document?  0.849555


## Save the scores to a .csv file

In [71]:
Similarity.to_csv('./Results/Similarity_Scores.csv')