Basicmeaning
Instatistics,linearregression(LinearRegression)istheuseoftheleastsquarefunctioncalledlinearregressionequationtodeterminetherelationshipbetweenoneormoreindependentvariablesanddependentvariablesAregressionanalysisformodeling.Thisfunctionisalinearcombinationofoneormoremodelparameterscalledregressioncoefficients.Thesituationwithonlyoneindependentvariableiscalledsimpleregression,andthesituationwithmorethanoneindependentvariableiscalledmultipleregression.(Thisinturnshouldbedistinguishedbymultiplelinearregressionpredictedbymultiplerelateddependentvariables,ratherthanasinglescalarvariable.)
Inlinearregression,thedataismodeledusingalinearpredictivefunction.Andunknownmodelparametersarealsoestimatedthroughdata.Thesemodelsarecalledlinearmodels.ThemostcommonlyusedlinearregressionmodelingisthattheconditionalmeanofyforagivenvalueofXisanaffinefunctionofX.Inalessgeneralcase,thelinearregressionmodelcanbeamedianorsomeotherquantileoftheconditionaldistributionofygivenXasalinearfunctionofX.Likeallformsofregressionanalysis,linearregressionalsofocusesontheconditionalprobabilitydistributionofyforagivenvalueofX,ratherthanthejointprobabilitydistributionofXandy(inthefieldofmultivariateanalysis).
Linearregressionisthefirsttypeofregressionanalysisthathasundergonerigorousresearchandiswidelyusedinpracticalapplications.Thisisbecauseamodelthatlinearlydependsonitsunknownparametersiseasiertofitthanamodelthatnon-linearlydependsonitsunknownparameters,andthestatisticalpropertiesoftheresultingestimatesareeasiertodetermine.
Linearregressionmodelsareoftenfittedwithleastsquaresapproximation,buttheymayalsobefittedwithothermethods,suchasminimizing"fittingdefects"insomeotherspecifications(suchasleastabsoluteErrorregression),orminimizethepenaltyoftheleastsquareslossfunctioninbridgeregression.Onthecontrary,theleastsquaresapproximationcanbeusedtofitthosenonlinearmodels.Therefore,althoughthe"leastsquaresmethod"andthe"linearmodel"areCloselyconnected,buttheycannotbeequated.
Fittingequation
Leastsquaresmethod
Generallyspeaking,linearregressioncanbeobtainedbytheleastsquaresmethodtofinditsequation,whichcanbecalculatedfory=Thestraightlineofbx+a.
Generally,thereisoftenmorethanonefactorthataffectsy.Supposetherearex1,x2,...,xk,kfactors,usuallythefollowinglinearrelationshipcanbeconsidered:
Foryandx1,x2,...,xkmakenindependentobservationsatthesametimetoobtainnsetsofobservations(xt1,xt2,...,xtk),t=1,2,...,n(n>k+1),theysatisfytherelation:
Amongthem,isnotrelatedtoeachotherandisrelatedtoRandomvariableswiththesamedistribution.Inordertoexpresstheaboveformulawithamatrix,let:
Sothereis,andusetheleastsquaremethodtogetthesolutionof.Amongthem,iscalledthepseudo-inverseof.
Regressioncoefficient
Generally,thisvalueisrequiredtobegreaterthan5%.Formostbehaviorresearchers,themostimportantthingistheregressioncoefficient.Whentheageincreasesby1unit,thequalityofthedocumentwilldecrease-1020986units,indicatingthatolderpeoplewillhavealowerevaluationofthequalityofthedocument.Thecorrespondingtvalueofthisvariableis-2.10,theabsolutevalueisgreaterthan2,andthepvalueisalso<0.05,soitissignificant.Theconclusionisthatolderpeoplewillhavealowerevaluationofdocumentquality,andthiseffectissignificant.Onthecontrary,peoplewithricherdomainknowledgewillhaveahigherevaluationofthequalityofthedocument,butthiseffectisnotsignificant.Thisunderstandingofregressioncoefficientsistheprocessofhypothesistestingusingregressionanalysis.
Errorofregressionequation
Sumofsquareddeviations
,,
whererepresentsthesumofsquaresofy;risthecorrelationcoefficient,representingtheproportionofvariationexplainedbytheregressionline;meansthatitcannotbeexplainedbytheregressionlineThevariationofSSE.
Accordingtotherelationshipbetweentheregressioncoefficientandtheslopeofthestraightline,theequivalentformcanbeobtained:,wherebistheslopeofthestraightline
Usingthepredictedvalue
,whereistheactualmeasuredvalue,andisthepredictedvaluecalculatedaccordingtothestraightlineequation.
Uncertainty
Slopeb
Method1:Use
Method2:Bringtheslopebinto
Intercepta
Application
Mathematics
Linearregressionhasmanypracticaluses.Dividedintothefollowingtwocategories:
Ifthegoalispredictionormapping,linearregressioncanbeusedtofitapredictionmodeltothevalueofXandtheobserveddataset.Whensuchamodeliscompleted,foranewlyaddedXvalue,withoutaypairedwithit,thefittedmodelcanbeusedtopredictayvalue.
GivenavariableyandsomevariablesX1,...,Xp,thesevariablesarepossibleRelatedtoy,linearregressionanalysiscanbeusedtoquantifythestrengthofthecorrelationbetweenyandXj,evaluateXjthatisnotrelatedtoy,andidentifywhichonesThesubsetofXjcontainsredundantinformationabouty.
Trendline
Atrendlinerepresentsthelong-termtrendoftimeseriesdata.Ittellsuswhetheraparticularsetofdata(suchasGDP,oilprices,andstockprices)hasincreasedordecreasedoveraperiodoftime.Althoughwecanroughlydrawatrendlinebyobservingthepositionofthedatapointinthecoordinatesystemwiththenakedeye,amoreappropriatemethodistouselinearregressiontocalculatethepositionandslopeofthetrendline.
Epidemiology
Earlyevidenceontheeffectofsmokingonmortalityandmorbiditycomesfromobservationalstudiesusingregressionanalysis.Inordertoreducespuriouscorrelationswhenanalyzingobserveddata,inadditiontothevariablesofmostinterest,researchersusuallyincludesomeadditionalvariablesintheirregressionmodels.Forexample,supposewehavearegressionmodelinwhichsmokingbehavioristheindependentvariablewearemostinterestedin,andtherelevantvariableisthesmoker'slifespanobservedoverseveralyears.Researchersmaytreatsocioeconomicstatusasanadditionalindependentvariable,ensuringthatanyobservedeffectsofsmokingonlifespanarenotcausedbyeducationorincomedifferences.However,itisimpossibleforustoaddallvariablesthatmayconfusetheresultsintotheempiricalanalysis.Forexample,anon-existentgenemayincreasethechanceofdeathandincreasetheamountofsmoking.Therefore,randomizedcontrolledtrialsoftenproducemoreconvincingevidenceofcausalitythantheconclusionsdrawnfromregressionanalysisusingobservationaldata.Whencontrollableexperimentsarenotfeasible,derivativesofregressionanalysis,suchasinstrumentalvariableregression,canbeusedtotrytoestimatethecausalityoftheobserveddata.
Finance
ThecapitalassetpricingmodeluseslinearregressionandtheconceptofBetacoefficienttoanalyzeandcalculatethesystemicriskofinvestment.ThisisdirectlyderivedfromtheBetacoefficientofthemodelthatlinksthereturnoninvestmentandthereturnonallriskyassets.
Economics
Linearregressionisthemainempiricaltoolofeconomics.Forexample,itisusedtopredictconsumptionexpenditures,fixedinvestmentexpenditures,inventoryinvestment,thepurchaseofacountry’sexportproducts,importexpenditures,requirementstoholdliquidassets,labordemand,andlaborsupply.