Python clone detection
This is a codebert model for detecting Python clone codes, fine-tuned on the dataset shared by PoolC on Hugging Face Hub. The original source code for using the model can be found at https://github.com/sangHa0411/CloneDetection/blob/main/inference.py.
How to use
To use the model in an efficient way, you can refer to this repository: https://github.com/RepoAnalysis/PythonCloneDetection, which contains a class that integrates data preprocessing, input tokenization, and model inferencing.
You can also follow the original inference source code at https://github.com/sangHa0411/CloneDetection/blob/main/inference.py.
More conveniently, a pipeline for this model has been implemented, and you can initialize it with only two lines of code:
from transformers import pipeline
pipe = pipeline(model="Lazyhope/python-clone-detection", trust_remote_code=True)
To use it, pass a tuple of code pairs:
code1 = """def token_to_inputs(feature):
inputs = {}
for k, v in feature.items():
inputs[k] = torch.tensor(v).unsqueeze(0)
return inputs"""
code2 = """def f(feature):
return {k: torch.tensor(v).unsqueeze(0) for k, v in feature.items()}"""
is_clone = pipe((code1, code2))
is_clone
# {False: 1.3705984201806132e-05, True: 0.9999862909317017}
Credits
We would like to thank the original team and authors of the model and the fine-tuning dataset:
Lincese
This model is released under the MIT license.
- Downloads last month
- 137