Edit model card

DataLinguistic-34B-V1.0 Chinese-English Question Answering Model

Model Overview

DataLinguistic-34B-V1.0 is a Chinese-English question answering model fine-tuned from Huggingface's CodeLlama-34b model with 4-bit quantization on DataLinguistic's proprietary datasets.

Model Architecture

DataLinguistic-34B-4bit-V1.0 inherits the encoder-decoder structure from Llama with 34B parameters.

Training Datasets

The model was trained on the following open-source datasets:

  • Data_OpenSet: Chinese-English question-answering dataset curated from junelee/wizard_vicuna_70k
  • Data_OpenSet2: Chinese-English question-answering dataset curated from garage-bAInd/Open-Platypus
  • Proprietary Chinese-English question-answering dataset collected internally by DataLinguistic (not open-sourced)

The data is formatted as: ""

<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}</s>

Use Cases

The model can be used for a wide range of Chinese-English question answering and chatbot applications.

Model Advantages

  • Based on huge model CodeLlama-34b with 34B parameters
  • Fine-tuned on large-scale Chinese-English QA datasets for high quality

Usage

  1. Install model from Huggingface
  2. Import and initialize model
  3. Input question, generate answer

Version

Current version: DataLinguistic-34B-V1.0

Author

Tang Zhengzheng

Contributors

DataLinguistic team

DataLinguistic-34B-V1.0 中英文问答模型

模型简介

DataLinguistic-34B-V1.0是一个基于Huggingface的CodeLlama-34b模型在DataLinguistic自建数据集上微调的中文英文问答模型。

模型结构

DataLinguistic-34B-V1.0 inherits the encoder-decoder structure from CodeLlama with 34B parameters.

模型训练数据集

模型使用了以下开源数据集进行了训练:

  • Data_OpenSet: 基于junelee/wizard_vicuna_70k整理的中英文问答数据集
  • Data_OpenSet2: 基于garage-bAInd/Open-Platypus整理的中英文问答数据集
  • DataLinguistic内部收集的专属中英文问答数据集(未开源)

数据集采用如下格式:

<s>please answer my question in datalynn model and Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response: {question}</s>

应用场景

该模型可广泛应用于中英文问答、聊天机器人等场景。

模型优势

  • 基于大模型Llama-34b,参数量达34亿
  • 在大规模中英文问答数据集上进行微调,质量较高

使用步骤

  1. 在Huggingface安装模型
  2. 导入并初始化模型
  3. 输入问题,生成回答

版本信息

当前版本:DataLinguistic-34B-V1.0

作者

唐正正

贡献者

DataLinguistic团队

Downloads last month
923
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using DataLinguistic/DataLinguistic-34B-V1.0 21

Evaluation results