GPT-3 and Cybersecurity
The use of deep neural networks has significantly improved the performance of machine learning in fields such as image recognition, machine translation, and malware classification. A key lesson from recent deep learning success is that, as we scale neural networks, they tend to get better in ways that can be game-changing. In this post, we explain how supercomputer-scale neural networks can be used for machine learning security applications in new and powerful ways. We will demonstrate two cybersecurity use cases that would not have been possible with smaller models: detecting spam messages with few training examples and generating human-readable explanations of difficult-to-parse commands.
GPT-3 is a pre-trained, large-scale language model, and its flexibility and accuracy are game-changing. If input and output data can be converted into text, GPT-3’s potential applications are endless. For example, it is possible to ask GPT-3 to write working Python code from a function description. Furthermore, it is possible to build a classification application with only a few examples.
It is usually easy to find an unlabelled dataset in the cybersecurity domain; however, it is often time-consuming and difficult to create a labelled dataset for training a traditional machine learning model. Traditional machine learning models trained with few examples commonly exhibit overfitting problems, where they do not generalize well to previously unseen samples. On the other hand, GPT-3’s few-shot learning only requires few annotated training samples and outperforms traditional models. As GPT-3 has been trained in a self-supervised way on a large general corpus, it turned out that it can do well on multiple classification problems with just a few examples. The ability to generalize from just a few labelled examples makes GPT-3 potentially a powerful tool in cybersecurity where for problems where there are very few labelled examples. Below we explore two such use cases.