Author: Sedky, Safaa Magdy Abdel-Hamid./ Title: Efficient Spam Email Filtering Based on Artificial Intelligence Methods /

Search In this Thesis

العنوان

Efficient Spam Email Filtering Based on Artificial Intelligence Methods /

المؤلف

Sedky, Safaa Magdy Abdel-Hamid.

هيئة الاعداد

باحث / صفاء مجدي عبد الحميد صدقي

مشرف / ياسمين أبو السعود صالح متولي

مشرف / ميرفت ميخائيل راغب

mrvatmekhaeil@yahoo.com

مناقش / محمد عبد الحميد إسماعيل

drmaismail@gmail.com

مناقش / سلوى كمال عبد الحفيظ

الموضوع

Mathematics.

تاريخ النشر

2023.

عدد الصفحات

156 p. :

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

الرياضيات (المتنوعة)

تاريخ الإجازة

19/3/2023

مكان الإجازة

جامعة الاسكندريه - كلية الهندسة - الرياضيات والفيزياء الهندسية

الفهرس

Only 14 pages are availabe for public view

from

190

from

190

Abstract

Spam emailsrepresentathreattosecurityandcauseabigwasteintransmissiontimeand users timespentinreadingthem.Alotofbandwidthandlargestorageareconsumedby these spamemails,yieldingtofinanciallossesforinstitutionsandannoyingindividualusers. Another typeofmaliciousemailsisphishingemailsthataimtogetsensitiveinformation from usersleadingtocredentialtheft.Thisformsachallengingthreatinthecybersecurity domain. Manymachinelearning(ML)basedfiltersareusedtoclassifyemailsashamor spam emails.However,machinelearningclassifiersarevulnerabletoadversarialattacks, where anattackeraimstodeceiveanML-model.Hence,itisanimpellingneedtoprotect machine learningmodelsagainstsuchattacksunderdifferentattackscenarios. The aimofthisthesisistoconstructareal-timeandaccurateML-basedspamdetector,in both cleanandadversarialenvironments,capableofcompetingwiththestate-of-the-arttech- niques. Towardsthisend,weinvestigateseveralmachinelearningclassifiersasspamfilters and weendupproposinganartificialneuralnetwork(ANN)modelthatshowsimprovement in performancecomparedtorecentrelatedstudies.Fourbenchmarkdatasets;SpamBase, Phishing corpus, CSDMC2010andEnrondatasets,areutilizedinthestudyexperiments. Severalfeatureselectionmethodsarestudiedandtheeffectofthesemethodsontheclassi- fier performanceisdemonstrated. Differentperformancemeasuresareusedformodelvalidationandtesting.Additionally,the time consumedinbothofflinetrainingandonlinedetectionstagesisreported.Theproposed ANN-based classifierconsidersthevalidationaccuracyalongwiththetrainingaccuracy, achievingfastandcompetitiveperformancepromotingitsuseinpracticalscenarios.Based on conductedcomparativestudies,itbecomesapparentthattheproposedANN-basedspam filter outperformsotherstate-of-the-artML-basedfilters. Next,theresilienceofseveraltraditionalML-basedspamclassifierstoadversarialattacksis investigated.Usingre-trainingwithadversarialsamplesdefensetechnique,theML-based spam filtersperformanceissignificantlyimprovedachievinganaccuracycomparabletothe original oneinacleanenvironment. Extending theexperimentstoincludetheproposedANNmodel,differentattackscenariosare examinedincludingwhite-boxattacks(whichassumethattheattackerknowstheMLmodel) and black-boxattacks(thatassumetheML-modelisnotknowntotheattacker).Bothattacks during trainingtime(poisoningattacks)andthoseoccurringattestingtime(evasionattacks) are consideredintheintroducedexperiments.Theeffectofvaryingthestrengthoftheattack >basedspamfilterismonitoredaidedwithsecurity on theperformanceoftheproposedANN-evaluationcurves.Moreover,thevalidityofthetransferabilitypropertyofadversarialexam- ples acrossdifferentmodelsisdemonstrated,wheretheimpactoftheadversarialexamples on theoriginalmodel(surrogatemodel)isalmostthesameforothermodels(targetmodels). The experimentalresultsshowthattheproposedANN-basedspamfilterisnotonlysimple and efficient,butalsorobustagainstmanyevasionattacksandaselectedpoisoningattack.