Spaces:
Sleeping
Sleeping
Alexander Watson
commited on
Commit
β’
2106945
1
Parent(s):
06594f2
doc updates
Browse files
app.py
CHANGED
@@ -32,19 +32,25 @@ logger.addHandler(handler)
|
|
32 |
|
33 |
SAMPLE_DATASET_URL = "https://gretel-public-website.s3.us-west-2.amazonaws.com/datasets/llm-training-data/dolly-examples-qa-with-context.csv"
|
34 |
WELCOME_MARKDOWN = """
|
35 |
-
Gretel Navigator is
|
36 |
|
37 |
-
|
38 |
|
39 |
-
1. **
|
|
|
|
|
|
|
40 |
|
41 |
-
|
42 |
|
43 |
-
|
44 |
|
45 |
-
|
|
|
|
|
|
|
46 |
|
47 |
-
|
48 |
|
49 |
## π§ Getting Started
|
50 |
|
@@ -57,30 +63,15 @@ To start using Gretel Navigator, you'll need:
|
|
57 |
|
58 |
Gretel Navigator supports the following formats for input data:
|
59 |
|
60 |
-
-
|
61 |
-
- Input/Output
|
62 |
-
- Plain text data
|
63 |
-
- File formats:
|
64 |
-
- Hugging Face dataset
|
65 |
-
- CSV
|
66 |
-
- JSON
|
67 |
-
- JSONL
|
68 |
|
69 |
## π€ Output
|
70 |
|
71 |
Gretel Navigator generates one additional training example per row in the input/output pair format. You can specify requirements for the input and output pairs in the configuration. Run the process multiple times to scale your data to any desired level.
|
72 |
|
73 |
-
## π AI Alignment Techniques
|
74 |
-
|
75 |
-
Gretel Navigator incorporates AI alignment techniques to generate high-quality synthetic data:
|
76 |
-
|
77 |
-
- Diverse Instruction and Response Generation
|
78 |
-
- AI-Aligning-AI Methodology (AAA) for iterative data quality enhancement
|
79 |
-
- Quality Evaluation
|
80 |
-
- Bias and Toxicity Detection
|
81 |
-
|
82 |
-
By leveraging these techniques, Gretel Navigator helps you create training data that leads to more robust, unbiased, and high-performing AI models.
|
83 |
-
|
84 |
---
|
85 |
|
86 |
Ready to enhance your AI training data and unlock the full potential of your models? Let's get started with Gretel Navigator! π
|
@@ -89,9 +80,9 @@ Ready to enhance your AI training data and unlock the full potential of your mod
|
|
89 |
|
90 |
def main():
|
91 |
st.set_page_config(page_title="Gretel", layout="wide")
|
92 |
-
st.title("π¨ Gretel Navigator:
|
93 |
st.write(
|
94 |
-
"Generate diverse synthetic training data from text or existing datasets to improve
|
95 |
)
|
96 |
|
97 |
with st.expander("Introduction", expanded=False):
|
@@ -347,9 +338,6 @@ def main():
|
|
347 |
st.markdown("---")
|
348 |
st.markdown("### Format Prompts")
|
349 |
|
350 |
-
st.markdown("---")
|
351 |
-
st.markdown("### Format Prompts")
|
352 |
-
|
353 |
system_prompt = st.text_area(
|
354 |
"System Prompt",
|
355 |
value=st.session_state.get(
|
|
|
32 |
|
33 |
SAMPLE_DATASET_URL = "https://gretel-public-website.s3.us-west-2.amazonaws.com/datasets/llm-training-data/dolly-examples-qa-with-context.csv"
|
34 |
WELCOME_MARKDOWN = """
|
35 |
+
Gretel Navigator is a compound AI system designed to help you create high-quality, diverse training data examples through synthetic data generation techniques. It aims to assist in scenarios where you have limited training data or want to enhance the quality and diversity of your existing dataset.
|
36 |
|
37 |
+
Key Use Cases
|
38 |
|
39 |
+
1. **Create Diverse Training or Evaluation Data from a seed**: Generate diverse training or evaluation data from plain text or seed examples. This ensures your AI models are exposed to a wide range of scenarios and edge cases during training.
|
40 |
+
2. **Enhance Limited Training Data**: Expand your existing training data with additional synthetic examples generated by Gretel Navigator. This can help improve the robustness and generalization of your AI models.
|
41 |
+
3. **Mitigate Bias and Toxicity**: Generate training examples that are unbiased and non-toxic by incorporating diverse perspectives and adhering to ethical guidelines. This promotes fairness and responsible AI development.
|
42 |
+
4. **Enhance Model Performance**: Improve the performance of your AI models across various tasks by training them on domain specific synthetic data generated by Gretel Navigator.
|
43 |
|
44 |
+
## π Synthetic Data Generation
|
45 |
|
46 |
+
Gretel Navigator utilizes an agent-based system to generate high-quality synthetic data:
|
47 |
|
48 |
+
- Diverse Instruction and Response Generation
|
49 |
+
- Quality Evaluation and Ranking
|
50 |
+
- AI-Aligning-AI Methodology (AAA) for iterative data quality enhancement
|
51 |
+
- Co-teach, suggestions, and self-teaching for iterative improvement.
|
52 |
|
53 |
+
Leveraging these techniques, Gretel Navigator helps you create training data that leads to more robust, unbiased, and high-performing AI models.
|
54 |
|
55 |
## π§ Getting Started
|
56 |
|
|
|
63 |
|
64 |
Gretel Navigator supports the following formats for input data:
|
65 |
|
66 |
+
- Seed data
|
67 |
+
- Input/Output pairs (or instruction/response) with any number of ground truth or "context fields".
|
68 |
+
- Plain text (ground truth data)
|
69 |
+
- File formats: Hugging Face dataset, CSV, JSON, JSONL
|
|
|
|
|
|
|
|
|
70 |
|
71 |
## π€ Output
|
72 |
|
73 |
Gretel Navigator generates one additional training example per row in the input/output pair format. You can specify requirements for the input and output pairs in the configuration. Run the process multiple times to scale your data to any desired level.
|
74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
---
|
76 |
|
77 |
Ready to enhance your AI training data and unlock the full potential of your models? Let's get started with Gretel Navigator! π
|
|
|
80 |
|
81 |
def main():
|
82 |
st.set_page_config(page_title="Gretel", layout="wide")
|
83 |
+
st.title("π¨ Gretel Navigator: Create Synthetic Data from a Prompt")
|
84 |
st.write(
|
85 |
+
"Generate diverse synthetic training data from text or existing datasets to improve or evaluate AI models."
|
86 |
)
|
87 |
|
88 |
with st.expander("Introduction", expanded=False):
|
|
|
338 |
st.markdown("---")
|
339 |
st.markdown("### Format Prompts")
|
340 |
|
|
|
|
|
|
|
341 |
system_prompt = st.text_area(
|
342 |
"System Prompt",
|
343 |
value=st.session_state.get(
|