enhance-ai-training-data

Sleeping

App Files Files Community

Alexander Watson commited on Jul 3

Commit

2106945

•

1 Parent(s): 06594f2

doc updates

Browse files

Files changed (1) hide show

app.py +19 -31

app.py CHANGED Viewed

@@ -32,19 +32,25 @@ logger.addHandler(handler)
 SAMPLE_DATASET_URL = "https://gretel-public-website.s3.us-west-2.amazonaws.com/datasets/llm-training-data/dolly-examples-qa-with-context.csv"
 WELCOME_MARKDOWN = """
-Gretel Navigator is an interface designed to help you create high-quality, diverse training data examples through synthetic data generation techniques. It aims to assist in scenarios where you have limited training data or want to enhance the quality and diversity of your existing dataset.
-## 🎯 Key Use Cases
-1. **Augment Existing Training Data**: Expand your existing training data with additional synthetic examples generated by Gretel Navigator. This can help improve the robustness and generalization of your AI models.
-2. **Create Diverse Training or Evaluation Data**: Generate diverse training or evaluation data from plain text or seed examples. This ensures your AI models are exposed to a wide range of scenarios and edge cases during training.
-3. **Address Data Limitations**: Generate additional examples to fill gaps in your dataset, particularly for underrepresented classes, rare events, or challenging scenarios. This helps improve your model's ability to handle diverse real-world situations.
-4. **Mitigate Bias and Toxicity**: Generate training examples that are unbiased and non-toxic by incorporating diverse perspectives and adhering to ethical guidelines. This promotes fairness and responsible AI development.
-5. **Enhance Model Performance**: Improve the performance of your AI models across various tasks by training them on diverse synthetic data generated by Gretel Navigator.
 ## 🔧 Getting Started
@@ -57,30 +63,15 @@ To start using Gretel Navigator, you'll need:
 Gretel Navigator supports the following formats for input data:
-- Existing AI training or evaluation data formats:
-  - Input/Output pair format (or instruction/response) with any number of ground truth or "context fields".
-  - Plain text data.
-- File formats:
-  - Hugging Face dataset
-  - CSV
-  - JSON
-  - JSONL
 ## 📤 Output
 Gretel Navigator generates one additional training example per row in the input/output pair format. You can specify requirements for the input and output pairs in the configuration. Run the process multiple times to scale your data to any desired level.
-## 🌟 AI Alignment Techniques
-Gretel Navigator incorporates AI alignment techniques to generate high-quality synthetic data:
-- Diverse Instruction and Response Generation
-- AI-Aligning-AI Methodology (AAA) for iterative data quality enhancement
-- Quality Evaluation
-- Bias and Toxicity Detection
-By leveraging these techniques, Gretel Navigator helps you create training data that leads to more robust, unbiased, and high-performing AI models.
 ---
 Ready to enhance your AI training data and unlock the full potential of your models? Let's get started with Gretel Navigator! 🚀
@@ -89,9 +80,9 @@ Ready to enhance your AI training data and unlock the full potential of your mod
 def main():
     st.set_page_config(page_title="Gretel", layout="wide")
-    st.title("🎨 Gretel Navigator: Enhance Your AI Training Data")
     st.write(
-        "Generate diverse synthetic training data from text or existing datasets to improve the performance and robustness of your AI models."
     )
     with st.expander("Introduction", expanded=False):
@@ -347,9 +338,6 @@ def main():
             st.markdown("---")
             st.markdown("### Format Prompts")
-            st.markdown("---")
-            st.markdown("### Format Prompts")
             system_prompt = st.text_area(
                 "System Prompt",
                 value=st.session_state.get(

 SAMPLE_DATASET_URL = "https://gretel-public-website.s3.us-west-2.amazonaws.com/datasets/llm-training-data/dolly-examples-qa-with-context.csv"
 WELCOME_MARKDOWN = """
+Gretel Navigator is a compound AI system designed to help you create high-quality, diverse training data examples through synthetic data generation techniques. It aims to assist in scenarios where you have limited training data or want to enhance the quality and diversity of your existing dataset.
+Key Use Cases
+1. **Create Diverse Training or Evaluation Data from a seed**: Generate diverse training or evaluation data from plain text or seed examples. This ensures your AI models are exposed to a wide range of scenarios and edge cases during training.
+2. **Enhance Limited Training Data**: Expand your existing training data with additional synthetic examples generated by Gretel Navigator. This can help improve the robustness and generalization of your AI models.
+3. **Mitigate Bias and Toxicity**: Generate training examples that are unbiased and non-toxic by incorporating diverse perspectives and adhering to ethical guidelines. This promotes fairness and responsible AI development.
+4. **Enhance Model Performance**: Improve the performance of your AI models across various tasks by training them on domain specific synthetic data generated by Gretel Navigator.
+## 🌟 Synthetic Data Generation
+Gretel Navigator utilizes an agent-based system to generate high-quality synthetic data:
+- Diverse Instruction and Response Generation
+- Quality Evaluation and Ranking
+- AI-Aligning-AI Methodology (AAA) for iterative data quality enhancement
+- Co-teach, suggestions, and self-teaching for iterative improvement.
+Leveraging these techniques, Gretel Navigator helps you create training data that leads to more robust, unbiased, and high-performing AI models.
 ## 🔧 Getting Started
 Gretel Navigator supports the following formats for input data:
+- Seed data
+  - Input/Output pairs (or instruction/response) with any number of ground truth or "context fields".
+  - Plain text (ground truth data)
+- File formats: Hugging Face dataset, CSV, JSON, JSONL
 ## 📤 Output
 Gretel Navigator generates one additional training example per row in the input/output pair format. You can specify requirements for the input and output pairs in the configuration. Run the process multiple times to scale your data to any desired level.
 ---
 Ready to enhance your AI training data and unlock the full potential of your models? Let's get started with Gretel Navigator! 🚀
 def main():
     st.set_page_config(page_title="Gretel", layout="wide")
+    st.title("🎨 Gretel Navigator: Create Synthetic Data from a Prompt")
     st.write(
+        "Generate diverse synthetic training data from text or existing datasets to improve or evaluate AI models."
     )
     with st.expander("Introduction", expanded=False):
             st.markdown("---")
             st.markdown("### Format Prompts")
             system_prompt = st.text_area(
                 "System Prompt",
                 value=st.session_state.get(