Naivní Bayes - techintroduce

Definice

Bayesovská metoda

TheBayesovská metodaisbasedontheBayesianprincipleandusestheknowledgeofprobabilityandstatisticstoclassifythesampledataset.Duetoitssolidmathematicalfoundation,themisjudgmentrateofBayesianclassificationalgorithmisverylow.ThecharacteristicofBayesovská metodaistocombinethepriorprobabilityandposteriorprobability,whichavoidsthesubjectivebiasofusingonlythepriorprobability,andalsoavoidstheover-fittingphenomenonofusingthesampleinformationalone.TheBayesianclassificationalgorithmshowsahigheraccuracyratewhenthedatasetislarge,andthealgorithmitselfisrelativelysimple.

Naivní bayesovský algoritmus

Naivní bayesovský algoritmusisoneofthemostwidelyusedclassificationalgorithms.

NaiveBayesovská metodaisbasedontheBayesianalgorithm,whichiscorrespondinglysimplified,thatis,itisassumedthattheattributesareconditionallyindependentofeachotherwhenthetargetvalueisgiven.Thatistosay,noattributevariablehasalargerproportiontothedecisionresult,andnoattributevariablehasasmallerproportiontothedecisionresult.AlthoughthissimplificationmethodreducestheclassificationeffectoftheBayesianclassificationalgorithmtoacertainextent,inactualapplicationscenarios,itgreatlysimplifiesthecomplexityoftheBayesovská metoda.

Naive Bayes

Princip Algoritmu

NaiveBayesClassification(NBC)isamethodbasedonBayes'theoremandassumingthatthefeatureconditionsareindependentofeachother,firstthroughthegiventrainingSet,taketheindependencebetweenfeaturewordsasthepremise,learnthejointprobabilitydistributionfrominputtooutput,andthenbasedonthelearnedmodel,inputtofindtheoutputthatmaximizestheposteriorprobability

Thereisasampledataset,andthecharacteristicattributesetofthecorrespondingsampledatais.Theclassvariableis,thatis,

canbedividedintocategories.Whereismutuallyindependentandrandom,thepriorprobabilityofis,andtheposteriorprobabilityofis

,CanbeobtainedbythenaiveBayesalgorithm,theposteriorprobabilitycanbecalculatedfromthepriorprobability,theevidence,theclassconditionalprobability:/p>

NaiveBayesisbasedontheindependenceofeachfeature.Inthecaseofagivencategoryof,theaboveformulaItcanbefurtherexpressedasthefollowingformula:

Fromtheabovetwoformulas,theposteriorprobabilitycanbecalculatedas:

Sincethesizeofisfixed,whencomparingposteriorprobabilities,onlythenumeratoroftheaboveformulacanbecompared.Therefore,anaiveBayesiancalculationwithsampledatabelongingtothecategorycanbeobtained:

výhody a nevýhody

Výhody

TheNaiveBayesalgorithmassumesthattheattributesofthedatasetareindependentofeachother.Therefore,thelogicofthealgorithmisverysimpleandthealgorithmisrelativelystable.Whenthedatapresentsdifferentcharacteristics,theNaiveBayesalgorithmTheclassificationperformanceofYeshwillnotbemuchdifferent.Inotherwords,therobustnessofthenaiveBayesalgorithmisbetter,anditwillnotshowmuchdifferencefordifferenttypesofdatasets.Whentherelationshipbetweentheattributesofthedatasetisrelativelyindependent,thenaiveBayesclassificationalgorithmwillhavebetterresults.

Nevýhody

TheconditionofattributeindependenceisalsotheshortcomingofthenaiveBayesclassifier.Theindependenceoftheattributesofthedatasetisdifficulttosatisfyinmanycases,becausetheattributesofthedatasetareoftenrelatedtoeachother.Ifthiskindofproblemoccursintheclassificationprocess,theeffectoftheclassificationwillbegreatlyreduced.

aplikace

Klasifikace textu

Classificationisabasicprobleminthefieldofdataanalysisandmachinelearning.Klasifikace textuhasbeenwidelyusedinmanyaspectssuchasnetworkinformationfiltering,informationretrievalandinformationrecommendation.Data-drivenclassifierlearninghasalwaysbeenahotspotinrecentyears,withmanymethods,suchasneuralnetworks,decisiontrees,supportvectormachines,andnaiveBayes.Comparedwithotherwell-designedandmorecomplexclassificationalgorithms,thenaiveBayesclassificationalgorithmisoneoftheclassifierswithbetterlearningefficiencyandclassificationeffect.TheintuitivetextclassificationalgorithmisalsothesimplestBayesianclassifier.Ithasgoodinterpretability.ThecharacteristicofthenaiveBayesalgorithmisthatitassumesthattheappearanceofallfeaturesareindependentofeachotherandeachfeatureisequallyimportant.Butinfactthisassumptiondoesnotholdintherealworld:firstly,theinevitableconnectionbetweentwoadjacentwordscannotbeindependent;secondly,foranarticle,someoftherepresentativewordsdetermineitstheme.Thereisnoneedtoreadtheentirearticleandlookatallthewords.Therefore,itisnecessarytoadoptasuitablemethodforfeatureselection,sothatthenaiveBayesclassifiercanachievehigherclassificationefficiency.

Ostatní

NaiveBayesalgorithmplaysamoreimportantroleintextrecognitionandimagerecognitiondirection.Anunknowntextorimagecanbeclassifiedaccordingtoitsexistingclassificationrules,andfinallythepurposeofclassificationisachieved.

TheNaiveBayesalgorithmiswidelyusedinreallife,suchastextclassification,spamclassification,creditevaluation,phishingwebsitedetectionandsoon.