Spaces:
Sleeping
Sleeping
metadata
app_file: demo.py
license: mit
title: AI Text Steganography
sdk: gradio
colorFrom: red
colorTo: yellow
pinned: true
AI Text Steganography
Description
- This is the baseline implementation of AI Text Steganography for our final project in Software Designing and Applied Information Security courses in HCMUS-VNU.
- Our project focuses on hiding data inside a text sequence generated by LLMs (e.g. GPT-2).
- We took inspiration from Kirchenbauer et al..
Members
- Tran Nam Khanh
- Phan Le Dac Phu
Installation
- Clone this repository:
git clone https://github.com/trnKhanh/ai-text-steganography.git
cd ai-text-steganography
- (Optional) Create new conda environment:
conda create -n ai-text-steganography python=3.10
conda activate ai-text-steganography
- Install requirements:
pip install -r requirements.txt
Usage
- Gradio demo:
python demo.py
- RestAPI:
python api.py
- See help message of the Command Line Interface by:
python main.py -h
- To run analysis, see the help message by:
python analysis.py -h
Documentation
- To access the documentation for the RestAPI, launch the RestAPI and go to http://localhost:6969/docs
Configuration
config.ini
is the config file of the project. We use the modified syntax of theconfigparser
package. Every key-value pair follows the syntax:key = type:value
. Currently,type
can only beint
,float
orstr
.- Details on config:
server
: parameters for the RestAPI:models.names
: names of LLMs allowed. Note that this follows the name defined on Hugging Face.models.params
: parameters used to load LLMs.encrypt.default
: default parameters for encryption algorithm.decrypt.default
: default parameters for decryption algorithm.
Notes on implementation
- Because of the limited resources, we load multiple models on the same machine (implementation is in
model_factory.py
):- Each model is first loaded to the
load_device
(e.g. cpu). - If there is a request to use a specific model, it is loaded to the
run_device
(e.g. gpu) for inference.
- Each model is first loaded to the
- Therefore, only one model can be used for inference at a time. As a result, we could optimize the limited resources we have to allow users to choose different LLMs, but it forces the API to be synchronous instead.
TODO lists
- Baseline code.
- CLI.
- Hashing schemes.
- Rest API.
- Basic Demo.
- Statistical experiments.
- Attack strategies
- White-box
- Black-box