Full Record

New Search | Similar Records

Author
Title Authorship Identification and Verification of JavaScript Source Code: An Evaluation of Techniques:
URL
Publication Date
Degree Level masters
University/Publisher Delft University of Technology
Abstract The increasing number of criminals that exploit the speed and anonymity of the Web has become of increasing concern. Little effort has been spent to trace the authors of malicious code. To that end we investigated authorship identification and verification of JavaScript source code. We evaluated three character based approaches and propose a new domain specific approach. What is new in the domain specific analysis approach, is that it represents code by a parse tree to extract structural features. The evaluation of the techniques with open source code from GitHub, turned out that the approaches that use character n-gram features achieved the best performance. However, the combination of n-gram and domain specific features turned out to be complementary, resulting in a higher performance. Techniques that used similarity based classification were especially successful if a limited amount of training data were available, while feature vector based techniques were mainly successful when a large amount of training data were available and in an authorship verification context. By means of code minification we evaluated how the classification accuracy is affected by removing authorship information from the source code. Code minification has shown to significantly deteriorate the performance of the authorship analysis methods. Especially the compression based technique is robust against code minification.
Subjects/Keywords authorship analysis; authorship identification; authorhip verification; source code; n-gram; JavaScript; minification
Contributors Zaidman, A.E.
Language en
Rights (c) 2014 Wilco, W.C.
Country of Publication nl
Record ID oai:tudelft.nl:uuid:f6aa2f88-e657-4fef-b684-188a212c71ad
Other Identifiers uuid:f6aa2f88-e657-4fef-b684-188a212c71ad
Repository delft
Date Indexed 2017-06-19

Sample Search Hits | Sample Images

…2 Preliminaries and Related Work 7 2.1 The authorship analysis process . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Source code authorship identification . . . . . . . . . . . . . . . . . . . . . 15 2.3 Conclusion…

…37 4 Experimental Setup 39 4.1 Validation procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Source code collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 Results 49 5.1 Authorship identification

…respective authors. JavaScript is commonly used on webpages. The identification of the developer of fragment of JavaScript code forms a supplementary source of authorship information that can play a crucial role in a forensic investigation, such as tracing…

…Similarly, source code authorship identification is different from identifying code clones in software. 1.2.2 Authorship verification Authorship verification (also known as authorship discrimination or similarity detection) involves determining…

…and verification problem. Several studies in literature considered authorship identification of source code, but no studies have been found that addressed authorship verification. This study aims to contribute to the field by embedding the implemented…

…moves on to authorship identification of source code and we investigate how this task has been approached in literature. Because many concepts discussed in this chapter are not limited to software but also apply to texts, we use the word document to…

…refer to a text in natural language as well as a piece of source code. 2.1 The authorship analysis process In general, authorship identification involves the categorization of documents into a predefined number of classes, and is thus a classification…

…in both authorship identification of texts and of source code [9]. In general, an n-gram is a consecutive sequence consisting of units of length n. By using a sliding window of length n, a document of t units will generate t − n + 1 n-grams…

.