tnk2908's picture
Update README.md
017140f verified
|
raw
history blame
2.63 kB
metadata
app_file: demo.py
license: mit
title: AI Text Steganography
sdk: gradio
colorFrom: red
colorTo: yellow
pinned: true

AI Text Steganography

Description

  • This is the baseline implementation of AI Text Steganography for our final project in Software Designing and Applied Information Security courses in HCMUS-VNU.
  • Our project focuses on hiding data inside a text sequence generated by LLMs (e.g. GPT-2).
  • We took inspiration from Kirchenbauer et al..

Members

  • Tran Nam Khanh
  • Phan Le Dac Phu

Installation

  1. Clone this repository:
git clone https://github.com/trnKhanh/ai-text-steganography.git
cd ai-text-steganography
  1. (Optional) Create new conda environment:
conda create -n ai-text-steganography python=3.10
conda activate ai-text-steganography
  1. Install requirements:
pip install -r requirements.txt

Usage

  • Gradio demo:
python demo.py
  • RestAPI:
python api.py
  • See help message of the Command Line Interface by:
python main.py -h
  • To run analysis, see the help message by:
python analysis.py -h

Documentation

Configuration

  • config.ini is the config file of the project. We use the modified syntax of the configparser package. Every key-value pair follows the syntax: key = type:value. Currently, type can only be int, float or str.
  • Details on config:
    • server: parameters for the RestAPI:
    • models.names: names of LLMs allowed. Note that this follows the name defined on Hugging Face.
    • models.params: parameters used to load LLMs.
    • encrypt.default: default parameters for encryption algorithm.
    • decrypt.default: default parameters for decryption algorithm.

Notes on implementation

  • Because of the limited resources, we load multiple models on the same machine (implementation is in model_factory.py):
    • Each model is first loaded to the load_device (e.g. cpu).
    • If there is a request to use a specific model, it is loaded to the run_device (e.g. gpu) for inference.
  • Therefore, only one model can be used for inference at a time. As a result, we could optimize the limited resources we have to allow users to choose different LLMs, but it forces the API to be synchronous instead.

TODO lists

  • Baseline code.
  • CLI.
  • Hashing schemes.
  • Rest API.
  • Basic Demo.
  • Statistical experiments.
  • Attack strategies
    • White-box
    • Black-box