Määritelmä
Bayesin menetelmä
TheBayesin menetelmäisbasedontheBayesianprincipleandusestheknowledgeofprobabilityandstatisticstoclassifythesampledataset.Duetoitssolidmathematicalfoundation,themisjudgmentrateofBayesianclassificationalgorithmisverylow.ThecharacteristicofBayesin menetelmäistocombinethepriorprobabilityandposteriorprobability,whichavoidsthesubjectivebiasofusingonlythepriorprobability,andalsoavoidstheover-fittingphenomenonofusingthesampleinformationalone.TheBayesianclassificationalgorithmshowsahigheraccuracyratewhenthedatasetislarge,andthealgorithmitselfisrelativelysimple.
NaiveBayesianalgoritmi
NaiveBayesianalgoritmiisoneofthemostwidelyusedclassificationalgorithms.
NaiveBayesin menetelmäisbasedontheBayesianalgorithm,whichiscorrespondinglysimplified,thatis,itisassumedthattheattributesareconditionallyindependentofeachotherwhenthetargetvalueisgiven.Thatistosay,noattributevariablehasalargerproportiontothedecisionresult,andnoattributevariablehasasmallerproportiontothedecisionresult.AlthoughthissimplificationmethodreducestheclassificationeffectoftheBayesianclassificationalgorithmtoacertainextent,inactualapplicationscenarios,itgreatlysimplifiesthecomplexityoftheBayesin menetelmä.
Algoritmin periaate
NaiveBayesClassification(NBC)isamethodbasedonBayes'theoremandassumingthatthefeatureconditionsareindependentofeachother,firstthroughthegiventrainingSet,taketheindependencebetweenfeaturewordsasthepremise,learnthejointprobabilitydistributionfrominputtooutput,andthenbasedonthelearnedmodel,inputtofindtheoutputthatmaximizestheposteriorprobability.
Thereisasampledataset,andthecharacteristicattributesetofthecorrespondingsampledatais.Theclassvariableis,thatis,canbedividedintocategories.Whereismutuallyindependentandrandom,thepriorprobabilityofis,andtheposteriorprobabilityofis,CanbeobtainedbythenaiveBayesalgorithm,theposteriorprobabilitycanbecalculatedfromthepriorprobability,theevidence,theclassconditionalprobability:/p>
NaiveBayesisbasedontheindependenceofeachfeature.Inthecaseofagivencategoryof,theaboveformulaItcanbefurtherexpressedasthefollowingformula:
Fromtheabovetwoformulas,theposteriorprobabilitycanbecalculatedas:
Sincethesizeofisfixed,whencomparingposteriorprobabilities,onlythenumeratoroftheaboveformulacanbecompared.Therefore,anaiveBayesiancalculationwithsampledatabelongingtothecategorycanbeobtained:
hyödyt ja haitat
Edut
TheNaiveBayesalgorithmassumesthattheattributesofthedatasetareindependentofeachother.Therefore,thelogicofthealgorithmisverysimpleandthealgorithmisrelativelystable.Whenthedatapresentsdifferentcharacteristics,theNaiveBayesalgorithmTheclassificationperformanceofYeshwillnotbemuchdifferent.Inotherwords,therobustnessofthenaiveBayesalgorithmisbetter,anditwillnotshowmuchdifferencefordifferenttypesofdatasets.Whentherelationshipbetweentheattributesofthedatasetisrelativelyindependent,thenaiveBayesclassificationalgorithmwillhavebetterresults.
Haitat
TheconditionofattributeindependenceisalsotheshortcomingofthenaiveBayesclassifier.Theindependenceoftheattributesofthedatasetisdifficulttosatisfyinmanycases,becausetheattributesofthedatasetareoftenrelatedtoeachother.Ifthiskindofproblemoccursintheclassificationprocess,theeffectoftheclassificationwillbegreatlyreduced.
Sovellus
Tekstin luokittelu
Classificationisabasicprobleminthefieldofdataanalysisandmachinelearning.Tekstin luokitteluhasbeenwidelyusedinmanyaspectssuchasnetworkinformationfiltering,informationretrievalandinformationrecommendation.Data-drivenclassifierlearninghasalwaysbeenahotspotinrecentyears,withmanymethods,suchasneuralnetworks,decisiontrees,supportvectormachines,andnaiveBayes.Comparedwithotherwell-designedandmorecomplexclassificationalgorithms,thenaiveBayesclassificationalgorithmisoneoftheclassifierswithbetterlearningefficiencyandclassificationeffect.TheintuitivetextclassificationalgorithmisalsothesimplestBayesianclassifier.Ithasgoodinterpretability.ThecharacteristicofthenaiveBayesalgorithmisthatitassumesthattheappearanceofallfeaturesareindependentofeachotherandeachfeatureisequallyimportant.Butinfactthisassumptiondoesnotholdintherealworld:firstly,theinevitableconnectionbetweentwoadjacentwordscannotbeindependent;secondly,foranarticle,someoftherepresentativewordsdetermineitstheme.Thereisnoneedtoreadtheentirearticleandlookatallthewords.Therefore,itisnecessarytoadoptasuitablemethodforfeatureselection,sothatthenaiveBayesclassifiercanachievehigherclassificationefficiency.
muut
NaiveBayesalgorithmplaysamoreimportantroleintextrecognitionandimagerecognitiondirection.Anunknowntextorimagecanbeclassifiedaccordingtoitsexistingclassificationrules,andfinallythepurposeofclassificationisachieved.
TheNaiveBayesalgorithmiswidelyusedinreallife,suchastextclassification,spamclassification,creditevaluation,phishingwebsitedetectionandsoon.