الفهرس | Only 14 pages are availabe for public view |
Abstract Spam emailsrepresentathreattosecurityandcauseabigwasteintransmissiontimeand users timespentinreadingthem.Alotofbandwidthandlargestorageareconsumedby these spamemails,yieldingtofinanciallossesforinstitutionsandannoyingindividualusers. Another typeofmaliciousemailsisphishingemailsthataimtogetsensitiveinformation from usersleadingtocredentialtheft.Thisformsachallengingthreatinthecybersecurity domain. Manymachinelearning(ML)basedfiltersareusedtoclassifyemailsashamor spam emails.However,machinelearningclassifiersarevulnerabletoadversarialattacks, where anattackeraimstodeceiveanML-model.Hence,itisanimpellingneedtoprotect machine learningmodelsagainstsuchattacksunderdifferentattackscenarios. The aimofthisthesisistoconstructareal-timeandaccurateML-basedspamdetector,in both cleanandadversarialenvironments,capableofcompetingwiththestate-of-the-arttech- niques. Towardsthisend,weinvestigateseveralmachinelearningclassifiersasspamfilters and weendupproposinganartificialneuralnetwork(ANN)modelthatshowsimprovement in performancecomparedtorecentrelatedstudies.Fourbenchmarkdatasets;SpamBase, Phishing corpus, CSDMC2010andEnrondatasets,areutilizedinthestudyexperiments. Severalfeatureselectionmethodsarestudiedandtheeffectofthesemethodsontheclassi- fier performanceisdemonstrated. Differentperformancemeasuresareusedformodelvalidationandtesting.Additionally,the time consumedinbothofflinetrainingandonlinedetectionstagesisreported.Theproposed ANN-based classifierconsidersthevalidationaccuracyalongwiththetrainingaccuracy, achievingfastandcompetitiveperformancepromotingitsuseinpracticalscenarios.Based on conductedcomparativestudies,itbecomesapparentthattheproposedANN-basedspam filter outperformsotherstate-of-the-artML-basedfilters. Next,theresilienceofseveraltraditionalML-basedspamclassifierstoadversarialattacksis investigated.Usingre-trainingwithadversarialsamplesdefensetechnique,theML-based spam filtersperformanceissignificantlyimprovedachievinganaccuracycomparabletothe original oneinacleanenvironment. Extending theexperimentstoincludetheproposedANNmodel,differentattackscenariosare examinedincludingwhite-boxattacks(whichassumethattheattackerknowstheMLmodel) and black-boxattacks(thatassumetheML-modelisnotknowntotheattacker).Bothattacks during trainingtime(poisoningattacks)andthoseoccurringattestingtime(evasionattacks) are consideredintheintroducedexperiments.Theeffectofvaryingthestrengthoftheattack >basedspamfilterismonitoredaidedwithsecurity on theperformanceoftheproposedANN-evaluationcurves.Moreover,thevalidityofthetransferabilitypropertyofadversarialexam- ples acrossdifferentmodelsisdemonstrated,wheretheimpactoftheadversarialexamples on theoriginalmodel(surrogatemodel)isalmostthesameforothermodels(targetmodels). The experimentalresultsshowthattheproposedANN-basedspamfilterisnotonlysimple and efficient,butalsorobustagainstmanyevasionattacksandaselectedpoisoningattack. |