Linear regression

Basicmeaning

Instatistics,linearregression(LinearRegression)istheuseoftheleastsquarefunctioncalledlinearregressionequationtodeterminetherelationshipbetweenoneormoreindependentvariablesanddependentvariablesAregressionanalysisformodeling.Thisfunctionisalinearcombinationofoneormoremodelparameterscalledregressioncoefficients.Thesituationwithonlyoneindependentvariableiscalledsimpleregression,andthesituationwithmorethanoneindependentvariableiscalledmultipleregression.(Thisinturnshouldbedistinguishedbymultiplelinearregressionpredictedbymultiplerelateddependentvariables,ratherthanasinglescalarvariable.)

Inlinearregression,thedataismodeledusingalinearpredictivefunction.Andunknownmodelparametersarealsoestimatedthroughdata.Thesemodelsarecalledlinearmodels.ThemostcommonlyusedlinearregressionmodelingisthattheconditionalmeanofyforagivenvalueofXisanaffinefunctionofX.Inalessgeneralcase,thelinearregressionmodelcanbeamedianorsomeotherquantileoftheconditionaldistributionofygivenXasalinearfunctionofX.Likeallformsofregressionanalysis,linearregressionalsofocusesontheconditionalprobabilitydistributionofyforagivenvalueofX,ratherthanthejointprobabilitydistributionofXandy(inthefieldofmultivariateanalysis).

Linearregressionisthefirsttypeofregressionanalysisthathasundergonerigorousresearchandiswidelyusedinpracticalapplications.Thisisbecauseamodelthatlinearlydependsonitsunknownparametersiseasiertofitthanamodelthatnon-linearlydependsonitsunknownparameters,andthestatisticalpropertiesoftheresultingestimatesareeasiertodetermine.

Linearregressionmodelsareoftenfittedwithleastsquaresapproximation,buttheymayalsobefittedwithothermethods,suchasminimizing"fittingdefects"insomeotherspecifications(suchasleastabsoluteErrorregression),orminimizethepenaltyoftheleastsquareslossfunctioninbridgeregression.Onthecontrary,theleastsquaresapproximationcanbeusedtofitthosenonlinearmodels.Therefore,althoughthe"leastsquaresmethod"andthe"linearmodel"areCloselyconnected,buttheycannotbeequated.

Fittingequation

Leastsquaresmethod

Generallyspeaking,linearregressioncanbeobtainedbytheleastsquaresmethodtofinditsequation,whichcanbecalculatedfory=Thestraightlineofbx+a.

Generally,thereisoftenmorethanonefactorthataffectsy.Supposetherearex1,x2,...,xk,kfactors,usuallythefollowinglinearrelationshipcanbeconsidered:

Foryandx1,x2,...,xkmakenindependentobservationsatthesametimetoobtainnsetsofobservations(xt1,xt2,...,xtk),t=1,2,...,n(n>k+1),theysatisfytherelation:

Amongthem,isnotrelatedtoeachotherandisrelatedto

Randomvariableswiththesamedistribution.Inordertoexpresstheaboveformulawithamatrix,let:

Sothereis,andusetheleastsquaremethodtogetthesolutionof.Amongthem,iscalledthepseudo-inverseof.

Regressioncoefficient

Generally,thisvalueisrequiredtobegreaterthan5%.Formostbehaviorresearchers,themostimportantthingistheregressioncoefficient.Whentheageincreasesby1unit,thequalityofthedocumentwilldecrease-1020986units,indicatingthatolderpeoplewillhavealowerevaluationofthequalityofthedocument.Thecorrespondingtvalueofthisvariableis-2.10,theabsolutevalueisgreaterthan2,andthepvalueisalso<0.05,soitissignificant.Theconclusionisthatolderpeoplewillhavealowerevaluationofdocumentquality,andthiseffectissignificant.Onthecontrary,peoplewithricherdomainknowledgewillhaveahigherevaluationofthequalityofthedocument,butthiseffectisnotsignificant.Thisunderstandingofregressioncoefficientsistheprocessofhypothesistestingusingregressionanalysis.

Errorofregressionequation

Sumofsquareddeviations

,,

whererepresentsthesumofsquaresofy;risthecorrelationcoefficient,representingtheproportionofvariationexplainedbytheregressionline;meansthatitcannotbeexplainedbytheregressionlineThevariationofSSE.

Accordingtotherelationshipbetweentheregressioncoefficientandtheslopeofthestraightline,theequivalentformcanbeobtained:,wherebistheslopeofthestraightline

Usingthepredictedvalue

,whereistheactualmeasuredvalue,andisthepredictedvaluecalculatedaccordingtothestraightlineequation.

Uncertainty

Slopeb

Method1:Use

Method2:Bringtheslopebinto

Intercepta

Application

Mathematics

Linearregressionhasmanypracticaluses.Dividedintothefollowingtwocategories:

  1. Ifthegoalispredictionormapping,linearregressioncanbeusedtofitapredictionmodeltothevalueofXandtheobserveddataset.Whensuchamodeliscompleted,foranewlyaddedXvalue,withoutaypairedwithit,thefittedmodelcanbeusedtopredictayvalue.

  2. GivenavariableyandsomevariablesX1,...,Xp,thesevariablesarepossibleRelatedtoy,linearregressionanalysiscanbeusedtoquantifythestrengthofthecorrelationbetweenyandXj,evaluateXjthatisnotrelatedtoy,andidentifywhichonesThesubsetofXjcontainsredundantinformationabouty.

Trendline

Atrendlinerepresentsthelong-termtrendoftimeseriesdata.Ittellsuswhetheraparticularsetofdata(suchasGDP,oilprices,andstockprices)hasincreasedordecreasedoveraperiodoftime.Althoughwecanroughlydrawatrendlinebyobservingthepositionofthedatapointinthecoordinatesystemwiththenakedeye,amoreappropriatemethodistouselinearregressiontocalculatethepositionandslopeofthetrendline.

Epidemiology

Earlyevidenceontheeffectofsmokingonmortalityandmorbiditycomesfromobservationalstudiesusingregressionanalysis.Inordertoreducespuriouscorrelationswhenanalyzingobserveddata,inadditiontothevariablesofmostinterest,researchersusuallyincludesomeadditionalvariablesintheirregressionmodels.Forexample,supposewehavearegressionmodelinwhichsmokingbehavioristheindependentvariablewearemostinterestedin,andtherelevantvariableisthesmoker'slifespanobservedoverseveralyears.Researchersmaytreatsocioeconomicstatusasanadditionalindependentvariable,ensuringthatanyobservedeffectsofsmokingonlifespanarenotcausedbyeducationorincomedifferences.However,itisimpossibleforustoaddallvariablesthatmayconfusetheresultsintotheempiricalanalysis.Forexample,anon-existentgenemayincreasethechanceofdeathandincreasetheamountofsmoking.Therefore,randomizedcontrolledtrialsoftenproducemoreconvincingevidenceofcausalitythantheconclusionsdrawnfromregressionanalysisusingobservationaldata.Whencontrollableexperimentsarenotfeasible,derivativesofregressionanalysis,suchasinstrumentalvariableregression,canbeusedtotrytoestimatethecausalityoftheobserveddata.

Finance

ThecapitalassetpricingmodeluseslinearregressionandtheconceptofBetacoefficienttoanalyzeandcalculatethesystemicriskofinvestment.ThisisdirectlyderivedfromtheBetacoefficientofthemodelthatlinksthereturnoninvestmentandthereturnonallriskyassets.

Economics

Linearregressionisthemainempiricaltoolofeconomics.Forexample,itisusedtopredictconsumptionexpenditures,fixedinvestmentexpenditures,inventoryinvestment,thepurchaseofacountry’sexportproducts,importexpenditures,requirementstoholdliquidassets,labordemand,andlaborsupply.

Related Articles
TOP