Shellcode_IA32

Name: Shellcode_IA32
Published: 2021-04-27
License: Unknown

Dataset Information

Languages

English

Introduced

2021

License

Unknown

Homepage

Official Website

Contents

Overview
Associated Benchmarks
Recent Benchmark Submissions
Research Papers

Overview

Shellcode_IA32 is a dataset containing 20 years of shellcodes from a variety of sources is the largest collection of shellcodes in assembly available to date.

This dataset consists of 3,200 examples of instructions in assembly language for IA-32 (the 32-bit version of the x86 Intel Architecture) from publicly available security exploits. We collected assembly programs used to generate shellcode from exploit-db and from shell-storm. We enriched the dataset by adding examples of assembly programs for the IA-32 architecture from popular tutorials and books. This allowed us to understand how different authors and assembly experts comment and, thus, how to deal with the ambiguity of natural language in this specific context. Our dataset consists of 10% of instructions collected from books and guidelines, and the rest from real shellcodes.

Our focus is on Linux, the most common OS for security-critical network services. Accordingly, we added assembly instructions written with Netwide Assembler (NASM) for Linux.

Each line of Shellcode_IA32 dataset represents a snippet - intent pair. The snippet is a line or a combination of multiple lines of assembly code, built by following the NASM syntax. The intent is a comment in the English language.

Further statistics on the dataset and a set of preliminary experiments performed with a neural machine translation (NMT) model are described in the following paper: Shellcode_IA32: A Dataset for Automatic Shellcode Generation.

Variants: Shellcode_IA32

Associated Benchmarks

This dataset is used in 1 benchmark:

Code Generation - Metrics: BLEU-4, Exact Match Accuracy

Recent Benchmark Submissions

Task	Model	Paper	Date
Code Generation	CodeBERT	Can We Generate Shellcodes via …	2022-02-08
Code Generation	Seq2Seq with Attention	Can We Generate Shellcodes via …	2022-02-08
Code Generation	LSTM-based Sequence to Sequence	Shellcode_IA32: A Dataset for Automatic …	2021-04-27

Research Papers

Recent papers with results on this dataset:

External Links:

Shellcode_IA32

Overview edit

Associated Benchmarks

Recent Benchmark Submissions

Research Papers

Edit Dataset Information

Overview