victormiller commited on
Commit
dc928e2
1 Parent(s): b7ea9e4

Update main.py

Browse files
Files changed (1) hide show
  1. main.py +1 -1
main.py CHANGED
@@ -805,7 +805,7 @@ def intro():
805
  P(
806
  "In pretraining, it is common to combine web data and curated sources (cite). Web data is included to provide a vast quantity of long tail and diverse data, while curated datasets are often information rich and provide the 'deep-dive' domain information. Combining both datasets plays a critical role for effective LLM pre-training. By integrating the reach of web data with the quality of curated sources, TxT360 meets and surpasses the rigorous standards required for state-of-the-art LLM pre-training. See Results section below."
807
  ),
808
- P("** TxT360 does not include code. This decision was made due to the perceived low duplication count of code. TxT360 can easily be combined with leading code dataset."),
809
  #P("Table 2: Basic TxT360 Statistics."),
810
  #table_div_data,
811
  id="section2",
 
805
  P(
806
  "In pretraining, it is common to combine web data and curated sources (cite). Web data is included to provide a vast quantity of long tail and diverse data, while curated datasets are often information rich and provide the 'deep-dive' domain information. Combining both datasets plays a critical role for effective LLM pre-training. By integrating the reach of web data with the quality of curated sources, TxT360 meets and surpasses the rigorous standards required for state-of-the-art LLM pre-training. See Results section below."
807
  ),
808
+ P("** TxT360 does not include code. This decision was made due to the perceived low duplication code with other sources."),
809
  #P("Table 2: Basic TxT360 Statistics."),
810
  #table_div_data,
811
  id="section2",