Data Profiling Phishing URLs to Find the Most Impactful Attributes
Abstract
Phishing is a common form of social engineering attack where an attacker crafts a malicious link, under the guise of a reputable source, that ultimately leads to a fraudulent website. The purpose of the fake website is to trick the user into entering personal or sensitive information, like account credentials, credit card numbers, or bank details. Phishing websites are becoming increasingly sophisticated and difficult to spot, especially due to the lack of user security training in most cases. Therefore, it is important to identify the most common attributes associated with the fraudulent URLs utilized in phishing attacks. A simple way to do this is data profiling, or reviewing source data, examining the structure and content, and identifying useful points to summarize the whole. This paper will utilize data profiling to analyze multiple phishing website datasets and locate the most impactful and easily identifiable features for recognizing if a URL is legitimate or not. The features found should be able to assist security professionals and nonprofessionals alike in identifying phishing websites.