Spaces:

LLM360
/

TxT360

Running

victormiller commited on Oct 4

Commit

22b2064

•

1 Parent(s): 5f4285e

Update curated.py

Files changed (1) hide show

curated.py CHANGED Viewed

@@ -647,7 +647,7 @@ filtering_process = Div(
     Section(
         Div(
         H3("PubMed Central and PubMed Abstract"),
-            P(B("Download and Extraction: "), "All files were downloaded from", A("ttps://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/",href="ttps://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/"),". PubMed Central (PMC) files are downloaded in an xml.tar format. The tar files are opened and converted to markdown format using pandoc", D_code("pandoc -f jats {nxml} -o {pmcid}.md", language="bash"),". The markdown files are combined to create jsonl files. PubMed Abstract (PMA) files  were downloaded in xml. The BeautifulSoup library was used to extract the abstract, title, and PMID. All files were stored in jsonl format.")
         H4("Filtering"),
         P("1. Multiple filters are used here after manually verifying output of all the filters as suggested by peS2o dataset."),
         Ol(

     Section(
         Div(
         H3("PubMed Central and PubMed Abstract"),
+            P(B("Download and Extraction: "), "All files were downloaded from", A("ttps://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/",href="ttps://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/"),". PubMed Central (PMC) files are downloaded in an xml.tar format. The tar files are opened and converted to markdown format using pandoc", D_code("pandoc -f jats {nxml} -o {pmcid}.md", language="bash"),". The markdown files are combined to create jsonl files. PubMed Abstract (PMA) files  were downloaded in xml. The BeautifulSoup library was used to extract the abstract, title, and PMID. All files were stored in jsonl format."),
         H4("Filtering"),
         P("1. Multiple filters are used here after manually verifying output of all the filters as suggested by peS2o dataset."),
         Ol(