Developmentprofile
In1866,thebasiclawsofhereditywererevealed.Thiswasthefirstsuccessfulexampleofusingmathematicalstatisticsinbiologicalexperiments(see).In1889,inthebook"InheritanceofNature",astudyontheheightofthehumanbodypointedoutthattheheightoftheoffspringisnotonlyrelatedtotheheightoftheparent,butalsohasatendencyto"return"totheaveragevalue,thusproposinga"regression"And"relevant"conceptsandalgorithms,thuslayingthefoundationofbiostatistics.Galton’sstudentK.Pearsonfurtherappliedstatisticstobiologicalresearch,andproposedtheconceptandalgorithmofthedeviationindexbetweentheactualmeasurednumberandthetheoreticalexpectednumber,thatis,thechi-variance().Thisisinthestatisticalanalysisofattributes.Playedanimportantrole.In1899,hefoundedthe"Biostatistics"magazineandalsoestablishedamathematicalstatisticsschool.HisstudentW.S.Gossettdidalotofresearchonthestandarddeviationofsamples,andin1908publishedthet-testmethodinthejournalBiostatisticsunderthepseudonym"Student".Sincethen,thet-testmethodhasbecomeoneofthebasictoolsinbiostatistics.Britishmathematicianspointedoutthatitisnotenoughtopayattentiontothedataanalysisafterthefact.Theexperimentaldesignmustbedoneinadvance.Hemadeexperimentaldesignabranchofbiostatistics.HisstudentGWSnydigcalledthemeansquareratiowithdifferentsourcesofvariationastheFvalue,andpointedoutthatwhenthevalueisgreaterthanthetheoreticalvalueof5%probabilitylevel,theinevitabilityeffectofthesourceofvariationisanalyzedfromthecontingencyvariable.Now,thisis"AnalysisofVariance".Theabove-mentionedmethodshaveplayedasignificantroleintheresearchofagriculturalscienceandbiology,especiallytheresearch.Sincethe1920s,variousmathematicalstatisticalmethodshavebeencreatedoneafteranother,andtheyhavebeenwidelyusedinlaboratories,fields,breedingandclinicalexperiments.Andincreasinglyexpandedtotheentireindustry.Inthe1970s,withthepopularizationofcomputers,statisticalmethodsthathadtobeabandonedduetoexcessivecalculationsgainednewvitality,weremorewidelyused,andoccupiesaveryimportantpositioninmodernscienceandtechnology.
Parameters
Themeasurementresultofcertaintraits(suchasheight,etc.)ofanobservationobject(suchasa7-year-oldboy)iscalledanindividual.Thedifferencebetweenindividualswiththesamesource(suchastheheightvalueofeach7-year-oldboy)iscalledindividualvariation.Thetotalityistheobjecttobeunderstoodthroughstatistics,andtheindividualsinitcanbefiniteorinfinite.Observationdatacanbecounted(discrete)(suchasthenumberofinsectsperunitarea)ormeasured(suchasheight,weight,bloodpressure,vitalcapacity,etc.).Therearetwotypesofthemostbasicparametersofthewhole:thepositionparameterortypevaluethatrepresentsthelevel,suchasmean,median,rate,etc.;thedispersionparameterthatreflectsthesizeofindividualdifferences,suchasstandarddeviation,range,etc.Theoverallparameterisanobjectivelyexistingbutusuallyunknownconstant.Itcanonlybeestimatedwithsamples.Therewillbeerrorsindoingso.Theaveragenumberofthesample,thatis
[1432-01],whichrepresentstheobservationvalueofthefirstindividual;isthenumberofindividualsinthesample,calledthesamplesize;∑isthesumnumber,andthetotalrepresentedby∑isusuallyfromThevaluecalculatedbythesampleiscalledastatistic,whichisanestimateofthecorrespondingpopulationvalue,forexample,anestimateofthepopulationmean.Ifthepopulationmeanisexactlyequal,itiscalledanunbiasedestimateof,whichmeansthattheestimateisunbiasedonaveragealthoughthereareerrors.Atthistime,itisalsocalledtheexpectationofandisrecordedas[1432-02].
Sampling
Inordertoestimatetheparametersofthepopulation(suchasmean,rate,standarddeviation,etc.),apartofindividualsisselectedtoformasampleforanalysiscalledsampling.Thesamplingmethodshouldbeabletopreventsubjectiveandobjectivefactorsfromcausingbias(iesystematicerror)andensuretherepresentativenessofthesampletothepopulation.Simplerandomsamplingistodrawlotsortheequivalentofdrawinglotsfromindividualstoformasamplefromthepopulation.Themainpointisthateachindividualinthepopulationmusthaveanequalchanceofbeingselected.Systematicsamplingistodividethepopulationintoequalpartsintimeorspaceorder,andthenmechanicallytakethefirstindividualofeachparttoformasample,whichisrandomlyselectedonce.Forexample,ifyouwanttocheckthescoresofone-tenthofastudent,youcanrandomlyselectanumberfromthe10integersfrom0to9.Ifitis3,allstudentswhosestudentIDnumberis3inthelastnumberwillbeselected.Object.Stratifiedsamplingistodividethepopulationintodifferentlevelsinadvance(suchasregion,age,gender,etc.),andthensamplefromeachlevelaccordingtoappropriateproportions.Thismethodcanbeusedtoobtainarepresentativesamplefromapopulationwithlargedifferencesbetweenlayers.Clustersamplingisbasedonthegroupasaunitforsampling,andallselectedunitsarefullyinvestigated.Thismethodiseasytoimplement,butthesamplingerrorisrelativelylarge.Generally,theordinaryformulabasedonsimplerandomsamplingcannotbeusedtocalculatethesamplingerror.Inaddition,theabovemethodscanalsobeusedinstagesandinamixedmanner.Suchastwo-stagesampling,multi-stagesampling,stratifiedclustersampling,multi-stageequalprobabilitysampling,etc.
Usingsamplestatisticstoestimatetheoverallparameterswillinevitablyhavesamplingerrors,anditssizeisproportionaltothesizeoftheindividualvariation(standarddeviation);itisinverselyproportionaltothesquarerootofthesamplecontent.Thestatisticalindicatorindicatingthesizeofthesamplingerroristhestandarderror[1434-01]orsubstitutingthestatistics[1434-02](7)Itisequivalenttothemean(orrate)ofeachsample(assumingthattherearemanysamplesfromthesamepopulation))Seethestandarddeviationofanindividual,whichreflectsthedifferencebetweendifferentsamplestakenfromthesamepopulation.Equation(7)issuitableforsimplerandomsamplingandsystematicsampling.Thecalculationformulasofothersamplingmethodsaremorecomplicated.
Thesignificanceofthedifferencetwoortwosetsofdata,therewillalwaysbelargeorsmalldifferences.Thequestioniswhetherthisdifferenceisjustareflectionofsamplingerrororisitbecausetheycomefromadifferentpopulation?Thatis,arethereanysubstantialdifferences?Instatisticalterms,itistojudgewhetherthedifferencebetweenthedatais"significant."Usingstatisticalmethodstoinferthenatureofthedifferenceiscalledthesignificancetestofthedifference.Therearemanymethodsofsignificancetesting,andthebasicstepsareasfollows:firstassumethatthedataarefromthesamepopulation,thatis,assumethatthedatatobecomparedhasnosubstantialdifference,whichiscalledthenullhypothesis;calculatethisdegreeofdifferenceduetosamplingerrorbasedontheoriginaldataIfitisverysmall,thenullhypothesisisrejectedbasedontheprincipleof“smallprobabilityeventsareactuallyunlikelytooccur”,andthe“significantdifference”isconsidered,thatis,thedifferenceismeaningfulfromastatisticalpointofview;viceversaIfitisnotsmall,thenullhypothesiswillnotberejected,andthe“differenceisnotsignificant”,thatis,fluctuationswithinthesamplingerrorrangecannotberuledout.Thecorrectuseofthesignificancetestcanmaketheconclusionsoftheexperimentorinvestigationbasedonamorescientificandreliablebasis,avoidingsimplificationandabsoluteness.
SignificancelevelThesizeoftheprobabilitycanonlyberelative.Inthesignificancetestofbiologicaldata,itiscustomarytouse=0.05astheupperlimitofthesmallprobability.Sometimes,forthesakeofstrictness,itisalsostipulatedthat=0.01.Calledthesignificancelevel,itistheprobabilityoffalselyrejectingthenullhypothesiswhenitiscorrect(typeIerror).Butit'snotthatthesmallerthebetter.Ifthenullhypothesisiswrongbutnotrejected,theprobabilityofit(typeIIerror)willbegreaterastheruleissmaller.IncreasingthesamplesizecanreducetheprobabilityoftypeIortypeIIerrors.
Non-parametricstatistics
Moststatisticalanalysismethodsarebasedonthebasicassumptionthat"dataobeysacertaindistribution(assumingmorenormaldistributions)".Statisticsareusedtoestimatetheoverallparameters,whichisnotthecaseinmostcases.Therearealotofdatathatdonothaveacorrespondingtheoreticaldistribution.Atthistime,statisticalmethodsthatdonotrelyonclothmethodaregenerallyusedforanalysis.Suchmethodsareoftenmoreintuitiveandsimpletocalculate.Commonly,suchasthestatisticalinferencemethodbasedonrank,thatis,theobservationresultcannotbedirectlyexpressedbydata,butexpressedbytherankorrankreflectingthesizeordegree(ie,rankconversion).Forexample,iftheobservationresultisabove"-","±","+"and"++",theranksaftersortingare1,2,3,4...Manyeffectiveundistributedmethodsarebasedondataorobservations.Thesizeoftheresultisinorder.Sincethedistribution-freemethodusuallydoesnotinvolvetheestimationandinferenceoftheparametersofthedatadistribution,itiscalledanon-parametricmethod.Insomeliteratures,itisalsocalleda"distributionfree"statisticalanalysismethod.
Survivalanalysis
Thedynamicobservationresultsofmanybiologicalphenomenaaremoreexplanatorythanaone-timecross-sectionalobservation.Forexample:theeffectofsurgicaltreatmentforpatientswithmalignanttumorsdependsontheirsurvivalrateafteraperiodoftimeaftersurgery,oritisnecessarytodrawasurvivalratecurveunderdifferentconditions(timeasthehorizontalaxisandsurvivalrateastheverticalaxis),Foranalysisandcomparison;theeffectoforgantransplantationdependsonthenormalworkingtimeandnon-rejectiontimeofallogeneicorgansinthebody.Theusesofsurvivalanalysisareextensive.
Multivariateanalysis
Alsoknownasmulti-indexormulti-variableanalysis,itisacomprehensiveanalysisofmultipleobservationindexesatthesametime,soitismorecomprehensiveandeffectivethanordinaryunarystatisticalanalysis.Thisisaseriesofgoodmethodsthathaveappearedinthe1940s.Asitinvolvesdeepmathematicalknowledgeandverycomplexcalculations,ithinderstheirpopularization.Withtheimprovementofcomputersandstatisticalsoftwarepackages,itisexpectedthatmultivariateanalysiswillsoonbecomeaconventionalweaponinbiologicalscienceresearch.Multipleregressionreferstotheregressionofmultipleindependentvariablesandonedependentvariable;whilemultipleregressionreferstotheregressionofmorethanonedependentvariable.Butthetwoareoftenusedconfusingly.Theycanbeusedforprediction,synthesisofindicators,orscreeningofindependentvariables.Discriminantanalysisistheuseofdiscriminantfunctionsintheformofmultipleregressionequationstojudgeordiagnoseindividualtypes.Clusteranalysisistoclassifymanyindividualsorindicatorsaccordingtotheirsimilarity.ClusteringindividualsiscalledQ-typeclustering;clusteringindicatorsiscalledR-typeclustering.Q-typeclusteringanddiscriminantanalysisaretwobasicmethodsofquantitativetaxonomy.Atrendsurfaceisahigher-orderequationwithgeographicallongitudeandlatitudeasindependentvariables.Itcanbeusedtodrawacontourmapofthegeographicaldistributiondensityoftheresearchobject,anditcanalsobeusedforprediction.Thepurposeofprincipalcomponentanalysisistotransformmanyinterrelatedindicatorsintoafewindependentcomprehensiveindicators,andtheycontainalmostallthestatisticalinformationoftheoriginalindicators.Thecalculationprocedureoffactoranalysisissimilartoprincipalcomponentanalysis,butitisnotthetransformationofresearchindicators,buttheanalysisoftheinternalconnectionsbetweenindividuals.Thismethodispioneeredbypsychologistsandcanalsobeusedtostudycomplexdiseases.
Statisticaltools
Fromaglobalperspective,JMPClinical,Matlab(BioinformaticsToolbox),Rlanguage,SPSS,PRIMERandothersoftwarearewidelyused,andthemainusersarebiostatisticsHome,bioinformatics,geneticists,students,etc.Thesesoftwarescanproviderichandpowerfulanalysisfunctionsanddynamicgraphicalanalysis,providingunprecedentedpowerful,convenientandefficientforthefieldsofseedcompounddiscovery,preclinicalresearch,clinicaltrials,epidemiologicalresearch,diseasecontrol,publichealth,andbiostatisticsteaching.Skillsofanalyze.,
Application
Agriculturalsciencehasgraduallytendedtoquantitativeresearchfromqualitativeresearchinthepast.Inthisprocess,mathematicaltoolsareindispensable.Thebiostatisticsproducedbythefusionofbiologyandmathematicsplaysahugeroleinmanyaspectsofagriculturalscienceresearch.
Inordertoimprovethequalityandyieldofagriculturalproducts,mycountryhasnewlyintroducedavarietyoffoodvarieties.However,issuessuchasthesuitableenvironmentfornewvarietiestogrow,thetypesoffertilizersthatareconducivetocropgrowth,andtheamountoffertilizerappliedneedtobeanalyzedandstudiedinadvanceusingbiostatistics.Inaddition,thegrowthofvariouspestsandweedsinthefarmlandecosystemwillalsohaveanadverseeffectoncrops.Agriculturalworkersgenerallycontrolthembysimplysprayingpesticides.Whatkindofpesticidesandtheamountofpesticidesappliedcannotonlyeffectivelyeliminatenaturalenemiesofcrops,butalsominimizethedamagetocrops,andatthesametimereduceeconomiclosses,etc.,itisalsonecessarytorelyonbiostatisticsforpredictionandforecasting.
Inaddition,somescholarshavefoundthatknowledgeofbiostatisticscanpredicttheoccurrenceofsomebiologicalphenomenawithhighaccuracy,butmanypeopledonotunderstandthis.Therefore,itisimperativetopopularizetheknowledgeofbiostatisticsamongrelevantpeople.