autoevaluator's picture
Add verifyToken field to verify evaluation results are produced by Hugging Face's automatic model evaluator
4be9297
|
raw
history blame
4.2 kB
metadata
tags:
  - question-answering
datasets:
  - squad_v2
metrics:
  - f1
  - exact
widget:
  - context: >-
      While deep and large pre-trained models are the state-of-the-art for
      various natural language processing tasks, their huge size poses
      significant challenges for practical uses in resource constrained
      settings. Recent works in knowledge distillation propose task-agnostic as
      well as task-specific methods to compress these models, with task-specific
      ones often yielding higher compression rate. In this work, we develop a
      new task-agnostic distillation framework XtremeDistilTransformers that
      leverages the advantage of task-specific methods for learning a small
      universal model that can be applied to arbitrary tasks and languages. To
      this end, we study the transferability of several source tasks,
      augmentation resources and model architecture for distillation. We
      evaluate our model performance on multiple tasks, including the General
      Language Understanding Evaluation (GLUE) benchmark, SQuAD question
      answering dataset and a massive multi-lingual NER dataset with 41
      languages.
    example_title: xtremedistil q1
    text: What is XtremeDistil?
  - context: >-
      While deep and large pre-trained models are the state-of-the-art for
      various natural language processing tasks, their huge size poses
      significant challenges for practical uses in resource constrained
      settings. Recent works in knowledge distillation propose task-agnostic as
      well as task-specific methods to compress these models, with task-specific
      ones often yielding higher compression rate. In this work, we develop a
      new task-agnostic distillation framework XtremeDistilTransformers that
      leverages the advantage of task-specific methods for learning a small
      universal model that can be applied to arbitrary tasks and languages. To
      this end, we study the transferability of several source tasks,
      augmentation resources and model architecture for distillation. We
      evaluate our model performance on multiple tasks, including the General
      Language Understanding Evaluation (GLUE) benchmark, SQuAD question
      answering dataset and a massive multi-lingual NER dataset with 41
      languages.
    example_title: xtremedistil q2
    text: On what is the model validated?
model-index:
  - name: nbroad/xdistil-l12-h384-squad2
    results:
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad_v2
          type: squad_v2
          config: squad_v2
          split: validation
        metrics:
          - type: exact_match
            value: 75.4591
            name: Exact Match
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiM2QzODE0YTE5ZjMyMWY3NzdjNjcwZDJjY2YyMjBkMWJjMTg3ZDAwYmUwNzU3ZTlkODhmM2VhMWFkY2I2ZjgzMyIsInZlcnNpb24iOjF9.IEjMS4U3uuSP6PfRcD87VFHBIdhoDsIfXkAYV7sz_bveSqhTE16VKJzHaDilCkUCBHYGTjoZ7pDqlYDcF6NKCQ
          - type: f1
            value: 79.3321
            name: F1
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjAxMDdkNzcxNjAzNzQ4N2MwN2Y3ZDZhOGM5MmU0MzYyOGFjNDM3NjJkNGUzYTkyYmY3MDk1ZGIxYzQ0ZDllMyIsInZlcnNpb24iOjF9.N0jPenoMpxbTzKeJciDfoXiLronfXx3uM-A9NEJCMQ9tiApF-EyNmh4F-G9GBXdbVsq1IZ3MbPto0mn0P9hADQ
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad
          type: squad
          config: plain_text
          split: validation
        metrics:
          - type: exact_match
            value: 81.8604
            name: Exact Match
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzRiYjBkYTU0MGRjZDZhNzY2MDZhMGYzZDY2NDU2MTMyMjk0M2YwNTcxZjkyMDNkYTE0YTA5ODVlY2EwOWIxYyIsInZlcnNpb24iOjF9.3jco8t0D7YkHtWHWRttV3y3L0ylQZj3y534HtIW7NuUX34nvVSGMzHVJ32BgaFDomOtnJkaSQFXmumO10FL2BA
          - type: f1
            value: 89.6654
            name: F1
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjg5YzNmODRlMTM1ZWQ1MjYwYzVkZmJhMzAwMDMzZGQyYzE1MzFlZGFlYmI4Y2JlMTQyNTBkZDRhMWQxYWQ2MCIsInZlcnNpb24iOjF9.Ld2IHVoqmZ-YFx71FgpuoVDEmAAboxRvhke1DhJYLbdIefM-AH60-58OlZcfZGxgUv6fywGjoPCE9g7CxbSzAQ

xtremedistil-l12-h384 trained on SQuAD 2.0

"eval_exact": 75.45691906005221
"eval_f1": 79.32502968532793