Keane Moraes commited on
Commit
10296ed
1 Parent(s): 1981c78

fixed and built generation from topics

Browse files
Files changed (6) hide show
  1. .gitignore +1 -0
  2. app.py +7 -4
  3. generation.py +0 -3
  4. insights.prompt +11 -1
  5. prompter/insights_33.prompt +32 -12
  6. utils.py +18 -11
.gitignore CHANGED
@@ -1,3 +1,4 @@
1
  /__pycache__*
2
  recursive-exclude * *.py[co]
3
  /.vscode*
 
 
1
  /__pycache__*
2
  recursive-exclude * *.py[co]
3
  /.vscode*
4
+ /prompter*
app.py CHANGED
@@ -30,15 +30,18 @@ if file1 is not None and file2 is not None:
30
  topics['insight1'] = [keywords1, concepts1]
31
  keywords2, concepts2 = insight2.generate_topics()
32
  topics['insight2'] = [keywords2, concepts2]
33
- st.success('Done!')
34
 
35
  with st.spinner("Flux capacitor is fluxing..."):
36
  embedder = utils.load_model()
37
- clutered = utils.cluster_based_on_topics(embedder, cleaned_text1, cleaned_text2)
38
  print(clutered)
39
- st.success("Done!")
40
 
41
  with st.spinner("Polishing up"):
42
  results = utils.generate_insights(topics, file1.name, file2.name, cleaned_text1, cleaned_text2, clutered)
43
- st.write(results)
44
  st.success("Done!")
 
 
 
 
 
 
 
30
  topics['insight1'] = [keywords1, concepts1]
31
  keywords2, concepts2 = insight2.generate_topics()
32
  topics['insight2'] = [keywords2, concepts2]
 
33
 
34
  with st.spinner("Flux capacitor is fluxing..."):
35
  embedder = utils.load_model()
36
+ clutered = utils.cluster_based_on_topics(embedder, cleaned_text1, cleaned_text2, num_clusters=5)
37
  print(clutered)
 
38
 
39
  with st.spinner("Polishing up"):
40
  results = utils.generate_insights(topics, file1.name, file2.name, cleaned_text1, cleaned_text2, clutered)
 
41
  st.success("Done!")
42
+
43
+ st.title("Insights generated")
44
+
45
+ for result in results:
46
+ with st.expander(result["name"]):
47
+ st.write(result["description"])
generation.py DELETED
@@ -1,3 +0,0 @@
1
- import openai
2
-
3
- def
 
 
 
 
insights.prompt CHANGED
@@ -14,4 +14,14 @@ The more complex concepts in document 2 is : {{complex2}}
14
 
15
  The sentences in one of the clusters is : {{sentences}}
16
 
17
- From the sentences and topics above, explain the common idea between the documents and write a paragraph about it and give me 3 new concepts that are linked to this idea.
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  The sentences in one of the clusters is : {{sentences}}
16
 
17
+ From the sentences and topics above, explain the common idea between the documents and write a paragraph about it and give me 3 new concepts that are linked to this idea.
18
+ You output format should be:
19
+
20
+ """
21
+ name: <FILL-CONCEPT-NAME-HERE>
22
+ description: <FILL-CONCEPT-DESCRIPTION-HERE>
23
+ related:
24
+ - <FILL-RELATED-CONCEPT-1>
25
+ - <FILL-RELATED-CONCEPT-2>
26
+ - <FILL-RELATED-CONCEPT-3>
27
+ """
prompter/insights_33.prompt CHANGED
@@ -1,21 +1,41 @@
1
  You are a highly intelligent bot that is tasked with common ideas between documents. The following are two documents that have been topic modelled and have been clustered based on concepts.
2
 
3
- The name for document 1 is : AI tutors will be held back by culture - by Henrik Karlsson.md
4
 
5
- The name for document 2 is : The Stability of Beliefs.md
6
 
7
- The topics for document 1 is : bull,picasso,education,ai,chilean,bull 1945,the bull,of bull,prize bull,bull to
8
 
9
- The topics for document 2 is : belief,beliefs,philosophy,epistemological,philosophic,science belief,scientific beliefs,beliefs ensconced,beliefs of,certain beliefs
10
 
11
- The more complex concepts in document 1 is : picasso lithographs bull,story bull bruce,bull culture necessary,lithographs bull 1945,bull didn know
12
 
13
- The more complex concepts in document 2 is : beliefs michael polanyi,beliefs held scientists,belief science declared,1951 scientific beliefs,michael polanyi essay
14
 
15
- The sentences in one of the clusters is : # key takeaways --- # transcript ## excerpt gpt-4, khan academy, wolfram alpha - we're seeing progress ai tools learning.
16
- demo state art ai tutoring capabilities, watch video march 14 salman khan khan academy demonstrates system built top gpt-4. video, khan uses ai model socratic tutor.
17
- gpt-4 occasionally hallucinates answers true.
18
- models improving faster anticipated, gpt-4 already scores top 10 percent university exams.
19
- march 23, nine days khan demo:ed tutoring system, openai partnered wolfram released plugin gives gpt-4 ability things like: way fluidly interacting information, shaping dialogue, immensely powerful.
20
 
21
- From the sentences and topics above, explain the common idea between the documents and write a paragraph about it and give me 3 new concepts that are linked to this idea.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  You are a highly intelligent bot that is tasked with common ideas between documents. The following are two documents that have been topic modelled and have been clustered based on concepts.
2
 
3
+ The name for document 1 is : Good conversations have lots of doorknobs.md
4
 
5
+ The name for document 2 is : First we shape our social graph; then it shapes us.md
6
 
7
+ The topics for document 1 is : spiderman,singing,musical,chorus,song,singing something,about spiderman,spiderman spiderman,spiderman,spiderman and
8
 
9
+ The topics for document 2 is : chimpanzees,genetically,womb,consciously,upbringings,from chimpanzees,chimpanzees as,chimpanzees in,chimpanzees and,chimpanzees
10
 
11
+ The more complex concepts in document 1 is : singing like spiderman,spiderman sudden pianist,songs spiderman scientific,just songs spiderman,excerpt spiderman boyfriend
12
 
13
+ The more complex concepts in document 2 is : chimpanzees born habitat,die chimpanzees born,sets apart chimpanzees,fast die chimpanzees,old children chimpanzees
14
 
15
+ The sentences in one of the clusters is : ask remove mask spot really hard, trick kept us afloat called “take-and-take focus,” meaning whoever singing keep going someone jumped take spotlight them, happen quickly often.
16
+ it’s easy remember lonely feels taker refuses cede spotlight you, easy forget lovely feels don’t want spotlight taker lets recline mezzanine fill stage.
17
+ it’s often unclear, stand around waiting someone else take turn invite us take ours.
18
+ we’re standing perimeter empty dance circle, takers martyrs launch middle .
 
19
 
20
+ From the sentences and topics above, explain the common idea between the documents and write a paragraph about it and give me 3 new concepts that are linked to this idea.
21
+ You output format should be:
22
+
23
+ """
24
+ name: <FILL-CONCEPT-NAME-HERE>
25
+ description: <FILL-CONCEPT-DESCRIPTION-HERE>
26
+ related:
27
+ - <FILL-RELATED-CONCEPT-1>
28
+ - <FILL-RELATED-CONCEPT-2>
29
+ - <FILL-RELATED-CONCEPT-3>
30
+ """
31
+
32
+ The common idea between the documents is the importance of collaboration and teamwork. In the first document, the idea of collaboration is explored in the context of music, with the chorus singing together to create a beautiful song. In the second document, the idea of collaboration is explored in the context of chimpanzees, with the idea that they work together to survive and thrive in their environment.
33
+
34
+ The concept of collaboration is an important one, and it is essential for any group of individuals to work together to achieve a common goal.
35
+
36
+ name: Group Dynamics
37
+ description: Group dynamics is the study of how people interact in groups and how their behavior affects the group as a whole.
38
+ related:
39
+ - Interpersonal Relationships
40
+ - Social Interaction
41
+ - Conflict Resolution
utils.py CHANGED
@@ -66,7 +66,7 @@ def generate_keywords(kw_model, document: str) -> list:
66
  final_topics.append(extraction[0])
67
  return final_topics
68
 
69
- def cluster_based_on_topics(embedder, text1:str, text2:str):
70
  nlp = spacy.load("en_core_web_sm")
71
 
72
  # Preprocess and tokenize the texts
@@ -87,7 +87,7 @@ def cluster_based_on_topics(embedder, text1:str, text2:str):
87
  all_embeddings = all_embeddings / np.linalg.norm(all_embeddings, axis=1, keepdims=True)
88
 
89
  # Perform agglomerative clustering
90
- clustering_model = AgglomerativeClustering(n_clusters=None, distance_threshold=1.5)
91
  clustering_model.fit(all_embeddings)
92
  cluster_assignment = clustering_model.labels_
93
 
@@ -121,27 +121,34 @@ def generate_insights(topics:dict, name1:str, name2:str, text1:str, text2:str, c
121
 
122
  for cluster_id, sentences in clusters.items():
123
 
124
- PROMPT = PROMPT.replace("{{sentences}}", "\n".join(sentences))
 
125
 
126
- with open(f"prompter/insights_{cluster_id}.prompt", "w") as f:
127
- f.write(PROMPT)
128
 
129
  # Generate insights for each cluster
130
  response = openai.Completion.create(
131
  model="text-davinci-003",
132
- prompt=PROMPT,
133
- temperature=0.5,
 
134
  top_p=1,
135
- max_tokens=1000,
136
  frequency_penalty=0.0,
137
  presence_penalty=0.0,
138
  )
139
 
140
  text = response['choices'][0]['text']
141
- with open(f"prompter/insights_{cluster_id}.txt", "a") as f:
142
- f.write(text)
 
 
 
 
 
 
143
 
144
- final_insights.append(text)
145
 
146
  return final_insights
147
 
 
66
  final_topics.append(extraction[0])
67
  return final_topics
68
 
69
+ def cluster_based_on_topics(embedder, text1:str, text2:str, num_clusters=3):
70
  nlp = spacy.load("en_core_web_sm")
71
 
72
  # Preprocess and tokenize the texts
 
87
  all_embeddings = all_embeddings / np.linalg.norm(all_embeddings, axis=1, keepdims=True)
88
 
89
  # Perform agglomerative clustering
90
+ clustering_model = AgglomerativeClustering(n_clusters=num_clusters)
91
  clustering_model.fit(all_embeddings)
92
  cluster_assignment = clustering_model.labels_
93
 
 
121
 
122
  for cluster_id, sentences in clusters.items():
123
 
124
+ print(cluster_id, " ", sentences)
125
+ final_prompt = PROMPT.replace("{{sentences}}", "\n".join(sentences))
126
 
127
+ # with open(f"prompter/insights_{cluster_id}.prompt", "w") as f:
128
+ # f.write(final_prompt)
129
 
130
  # Generate insights for each cluster
131
  response = openai.Completion.create(
132
  model="text-davinci-003",
133
+ prompt=final_prompt,
134
+ max_tokens=200,
135
+ temperature=0.4,
136
  top_p=1,
 
137
  frequency_penalty=0.0,
138
  presence_penalty=0.0,
139
  )
140
 
141
  text = response['choices'][0]['text']
142
+ name_location = text.find("Name:")
143
+ description_location = text.find("Description:")
144
+ name_of_insight = text[name_location+6:name_location+6+text[name_location+6:].find("\n")]
145
+ description = text[:name_location] + text[description_location+13:description_location+13+text[description_location+13:].find("\n")]
146
+ final_insights.append({"name": name_of_insight, "description": description})
147
+
148
+ # with open(f"prompter/insights_{cluster_id}.prompt", "a") as f:
149
+ # f.write(text)
150
 
151
+ # final_insights.append(text)
152
 
153
  return final_insights
154