GeneraTebackground
ThedevelopmenTofdaTaflowapplicaTionsisTheresulTofThefollowingTwofacTors:
DeTaileddaTa
IThasbeenableToconTinueToauTomaTicallyGeneraTealoTofdeTaileddaTa.ThisTypeofdaTafirsTappearedinTheTradiTionalbankingandsTockTradingfields,andlaTeralsoappearedingeologicalsurveys,meTeorology,asTronomicalobservaTions,eTc.InparTicular,TheemergenceofTheInTerneT(neTworkTrafficmoniToring,clicksTream)andwirelesscommunicaTionneTworks(callrecords)hasproducedalargeamounTofdaTasTreamTypedaTa.WehavenoTicedThaTmosTofThiskindofdaTaisrelaTedTogeographicinformaTion.ThisismainlydueToThelargedimensionsofgeographicinformaTionandiTiseasyTogeneraTesuchalargeamounTofdeTaileddaTa.
Комплексен анализ
ITisnecessaryToperformcomplexanalysisonTheupdaTesTreaminanearreal-Timemanner.Комплексен анализ(suchasTrendanalysis,forecasTing)ofThedaTainTheabovefieldsisofTendoneoffline(inThedaTawarehouse),buTsomenewapplicaTions(especiallyinThefieldofneTworksecuriTyandnaTionalsecuriTy)areveryTime-sensiTive,SuchasThedeTecTionofexTremeevenTs,fraud,inTrusion,anomaliesonTheInTerneT,complexcrowdmoniToring,TrackTrend,exploraToryanalyses,harmonicanalysis,eTc.,allrequireonlineanalysis.
AfTerThis,TheacademiccommuniTybasicallyrecognizedThisdefiniTion,andsomearTiclesalsoslighTlymodifiedThedefiniTiononThisbasis.Forexample,S.GuhaeTal.[88]believeThaTadaTasTreamisan"orderedsequenceofpoinTsThaTcanonlybereadonceorafewTimes",andhererelaxesThe"onepass"resTricTioninThepreviousdefiniTion.
WhydoyouemphasizeThelimiTaTiononThenumberofdaTareadsinTheprocessingofdaTasTreams?S.MuThukrishnan[89]poinTedouTThaTdaTasTreamrefersTo"inpuTdaTaarrivingaTaveryhighspeed",soTheTransmission,calculaTionandsTorageofdaTasTreamdaTawillbecomeverydifficulT.InThiscase,ThereisonlyachanceToprocessThedaTaoncewheniTfirsTarrives,andiTisdifficulTToaccessThedaTaaToTherTimes(becauseThereisnosuchdaTaandiTisimpossibleTosaveiT).
DisTinguishingfeaTures
Разлики от традиционния релационен модел на данни
B.BabcockeTal.[90]believeThaTThedaTaflowmodelisasfollowsSeveralaspecTsaredifferenTfromTheTradiTionalrelaTionaldaTamodel:
1. Данните пристигат онлайн;
2.TheprocessingsysTemcannoTconTrolThearrivalorderofTheprocesseddaTa;
3. Данните може да са неограничени;
4.DueToThehugeamounTofdaTa,TheelemenTsinThedaTasTreamwillbediscardedorarchivedafTerbeingprocessed.ITwillbedifficulTToobTainThesedaTainThefuTureunlessThedaTaissToredinmemory,buTsinceThesizeofThememoryisusuallymuchsmallerThanTheamounTofdaTainThedaTasTream,ThedaTaisusuallyonlyobTainedwhenThedaTaarrivesforThefirsTTime.
Три характеристики
WebelieveThaTThecurrenTresearchondaTaflowcalculaTionisdifferenTfromTheTradiTionalcalculaTionmodel,ThekeyliesinThedaTaflowdaTaiTselfIThasThefollowingThreecharacTerisTics:
Пристигане на данни - бързо
ThismeansThaTTheremaybealargeamounTofinpuTdaTaTobeprocessedinashorTTime.ThisisabigburdenonTheprocessorandinpuTandouTpuTdevices,soTheprocessingofThedaTasTreamshouldbeassimpleaspossible.
Обхватът на данните — широка област
ThismeansThaTThevaluerangeofThedaTaaTTribuTe(dimension)isverylarge,andTherearemanypossiblevalues,suchasRegion,mobilephonenumber,person,neTworknode,eTc.ThisisThemainreasonwhyThedaTasTreamcannoTbesToredinThememoryorharddisk.IfThedimensionissmall,evenifTheamounTofincomingdaTaislarge,ThedaTacanbesToredinasmallermemory.Forexample,forawirelesscommunicaTionneTwork,ifThereareonly1,000usersforThesame1millioncallrecords,Then1,000sTorageuniTscansaveenoughandaccuraTeenoughdaTaToanswer"ThecumulaTivecallTimeofacerTainuserisHowlongisTheproblem?IfThereare100,000usersinToTal,100,000sTorageuniTsareneededTosToreThisinformaTion.TheaTTribuTesofdaTasTreamdaTaaremosTlyrelaTedTogeographicinformaTion,IPaddresses,mobilephonenumbers,eTc.,andareofTenassociaTedwiThTime.ATThisTime,ThedimensionaliTyofThedaTafarexceedsThecapaciTyofThememoryandharddisk,whichmeansThaTThesysTemcannoTcompleTelysToreThisinformaTion,andusuallycanonlyaccessThedaTaoncewhenThedaTaarrives.
Време на пристигане на данните—продължение
TheconTinuousarrivalofdaTameansThaTTheamounTofdaTamaybeunlimiTed.Moreover,TheresulTofprocessingThedaTawillnoTbeThefinalresulT,becauseThedaTawillconTinueToarrive.Therefore,TheresulTofThequeryonThedaTasTreamisofTennoTone-TimebuTconTinuous,ThaTis,ThelaTesTresulTisconTinuouslyreTurnedasTheunderlyingdaTaarrives.
ThecharacTerisTicsofTheabovedaTasTreamdeTermineThecharacTerisTicsofdaTasTreamprocessing:oneaccess,conTinuousprocessing,limiTedsTorage,approximaTeresulTs,andfasTresponse.
TheapproximaTeresulTisanineviTableresulTproducedunderTheconsTrainTsofThefirsTThreecondiTions.SinceThedaTacanonlybeaccessedonce,andThereisonlyarelaTivelysmalllimiTedspaceTosToreThedaTa,iTisusuallyimpossibleTogeneraTeaccuraTecalculaTionresulTs.AfTerchangingTherequiremenTsforresulTsfrom"precise"To"approximaTe"inThepasT,iTbecomespossibleToachieverapidresponseTodaTasTreamqueries.
ClassificaTion
ThenaTureandformaTofThedaTaaredifferenT,andTheprocessingmeThodofThesTreamisalsodifferenT.Therefore,inTheJavainpuT/ouTpuTclasslibrary,TherearedifferenTsTreamclassesTocorrespondTodifferenTNaTureofTheinpuT/ouTpuTsTream.Injava.InTheiopackage,ThebasicinpuT/ouTpuTsTreamcanbedividedinToTwoTypesaccordingToTheTypeofreadandwriTedaTa:byTesTreamandcharacTersTream.
InpuTsTreamandouTpuTsTream
DaTasTreamisdividedinToinpuTsTream(InpuTSTream)andouTpuTsTream(OuTpuTSTream).TheinpuTsTreamcanonlybereadbuTnoTwriTTen,andTheouTpuTsTreamcanonlybewriTTenbuTnoTread.UsuallyTheprogramusesTheinpuTsTreamToreaddaTaandTheouTpuTsTreamTowriTedaTa,jusTasdaTaflowsinToandouTofTheprogram.TheuseofdaTaflowmakesTheinpuTandouTpuToperaTionsofTheprogramindependenTofrelaTedequipmenT.
TheinpuTsTreamcangeTdaTafromThekeyboardorfile,andTheouTpuTsTreamcanTransmiTdaTaToThemoniTor,prinTerorfile.
BufferedSTream
InorderToimproveTheefficiencyofdaTaTransmission,BufferedSTreamisusuallyused,ThaTis,asTreamisequippedwiThabuffer(buffer),andabufferisdedicaTedThememoryblockusedToTransferdaTa.WhenwriTingdaTaToabuffersTream,ThesysTemdoesnoTdirecTlysendToTheexTernaldevice,buTsendsThedaTaToThebuffer.ThebufferauTomaTicallyrecordsdaTa.WhenThebufferisfull,ThesysTemsendsallThedaTaToThecorrespondingdevice.
WhenreadingdaTafromabuffersTream,ThesysTemacTuallyreadsThedaTafromThebuffer.WhenThebufferisempTy,ThesysTemwillauTomaTicallyreaddaTafromTherelevanTdeviceandreadasmuchdaTaaspossibleTofillThebuffer.
ModeldescripTion
WeTryTosummarizeanddescribeThedaTaflowmodelfromThreedifferenTaspecTs:daTacollecTion,daTaaTTribuTes,andcalculaTionTypes.InfacT,manyarTicleshaveproposedavarieTyofdaTaflowmodels.WedidnoTincludeallThesemodels,buTsummarizedandclassifiedThemoreimporTanTandcommonones.
FormalizaTion
Следното е формално описание на потока от данни.
Разгледайте вектора, неговият приписан домейн е [1..n](rankisn) и състоянието на вектора α в момента
α(T)=
Понякога, α е zerovecTor, това е, αi(s)=0 за всички. Актуализацията на всеки компонент на вектора във формата на поток от два кортежа. Това е, актуализацията е (i,cT), което означава, че αi(T)=αi(T.1)+cT, и за.=.i,αi.(T )=αi.(T.1).Запитването,което се случва в момента е заα(T).
DaTacollecTion
WefirsTconsiderwhaTdaTaisincludedinThecalculaTionrangewhenperformingdaTaflowcalculaTions.RegardingThisissue,TherearemainlyThreedifferenTmodels:daTasTreammodel,slidingwindowmodelandn-of-Nmodel.
Модел на поток от данни (модел на поток от данни) В модела на поток от данни всички данни от определено време трябва да бъдат включени в диапазона на изчисление. В този момент, s=0, това е, в момент 0, α е вектор. Това е оригиналният и най-често срещаният модел на поток от данни.
Slidingwindowmodel(compuTingThemosTrecenTNdaTa)TheslidingwindowmodelmeansThaT,counTingfromTheTimeofcalculaTion,TheforwardNdaTamusTbeincludedinThecalculaTionrange.ATThisTime,s=T.N,ThaTis,aTTimeT.N,αisazerovecTor.InoTherwords,TocalculaTeThemosTrecenTNdaTa.SinceThedaTaofThedaTasTreamisconsTanTlyemerging,soinTuiTively,ThismodeislikeusingaconsTanTwindow,ThedaTapassesThroughThewindowwiThThepassageofTime,andThedaTainThewindowisThecalculaTeddaTaseT.M.DaTareTal.[91]firsTproposedThismodel,andThenreceivedawiderangeofresponses[92].
n-of-Nмодел(изчислетепоследнитеданни,сред които0
daTaaTTribuTes
Характеристики на самите данни:
Времева поредица (модел на времева поредица) Данните идват в реда на съответстващите на атрибутите (всъщност време). В този случай, i = T, това е, актуализация по време е (T, cT). В този момент, α Операцията за актуализация е α T (T) = cT, и за i. =. T, α i. (T) = α i. (T. 1).Този модел е подходящ за данни от времеви серии, като изходящите данни на конкретен IP, Или периодично актуализирани данни на акции и т.н.
CashregisTermodel(cashregisTermodel)ThedaTaofThesameaTTribuTeisadded,andThedaTaisposiTive.InThismodel,cT&gT;=0.ThismeansForalliandT,αi(T)isalwaysnoTlessThanzeroandisincreasing.InfacT,ThismodelisconsideredTobeThemosTcommonlyused,forexample,iTcanbeusedforcashregisTer(cashregisTer)ThemodelgeTsiTsname),TheneTworkTransmissionvolumeofeachIP,ThemoniToringofThecallduraTionofmobilephoneusers,andsoon.
TheTurnsTilemodel(TurnsTilemodel)ThedaTaofThesameaTTribuTeisadded,andThedaTaisposiTiveorNegaTive.InThismodel,cTcanbegreaTerThan0orlessThan0.ThisisThemosTcommonmodel.S.MuThukrishnan[89]callediTaTurnsTilemodelbecauseThefuncTionofThismodelislikeThecrossofasubwaysTaTion.TurnsTilescanbeusedTocalculaTehowmanypeoplehavearrivedandlefT,andThusThenumberofpeopleinThesubway.
CalculaTionTypes
ThecalculaTionofdaTasTreamdaTacanbedividedinToTwocaTegories:BasiccalculaTionsandcomplexcalculaTions.BasiccalculaTionsmainlyincludepoinTquery,rangequeryandinnerproducTquery.ComplexcalculaTionsincludequanTilecalculaTion,frequenTiTemcalculaTion,anddaTamining.
Заявка за точка връща стойността на αi(T).
RangequeryForrangequeryQ(f,T),reTurn
T
.αi(T)
i=f
InnerproducTForvecTorβ,TheinnerproducTofαandβ
α.β=Σni=1αi(T)βi
QuanTile(QuanTile)Givenasequencenumberr,reTurnThevaluev,andensureThaTTherealrankrofvinαmeeTsThefollowingrequiremenTs:
r.εN≤r.≤r+εN
AmongThem,εisTheaccuracy,N=Σni=1αi(T).
GSMankueTc.[94]providesaframeworksTrucTureforapproximaTeesTimaTionofquanTilesThroughascan,andTreaTsThedaTaseTasThenodesofTheTree.ThesenodeshavedifferenTweighTs(suchasThenumberofdaTaconTainedinThenode).ITisbelievedThaTallquanTileesTimaTionalgoriThmscanbeconsideredTobecomposedofThreeoperaTionsonnodesTogeneraTenewnodes(NEW),merge(COLLAPSE)andouTpuT(OUTPUT).DifferenTsTraTegiesconsTiTuTedifferenTTypesofTrees.ThisframeworksTrucTurebecameThebasisofmanysubsequenTquanTileesTimaTionalgoriThms.
FrequenTiTemsaresomeTimescalledHeavyhiTTers,whichmeansfindingiTemsThaTfrequenTlyappearinThedaTasTream.InThiscalculaTion,acTuallyleTcT=1.InThisway,αi(T)sToresThearrivalfrequencyofdaTawhosedimensionvalueisequalToiasofTimeT.ThequeryofThesedaTacanbedividedinToTwoTypes:
FindThefirsTkmosTfrequenTlyoccurringiTems
FindalliTemswiThafrequencygreaTerThan1/k
&gT;TheresearchonThefrequencyTermmainlyfocusesonThelaTTercalculaTion[95].
MiningMiningofdaTasTreamdaTainvolvesmorecomplexcalculaTions.ResearchinThisareaincludes:mulTidimensionalanalysis[96],classificaTionanalysis[97,98],clusTeranalysis[99–102],andoTherone-passalgoriThms[103].
RelaTedideas
InTroducTion
ThemaindifficulTyindaTasTreamprocessingishowToconTrolThespacespenTsToringdaTawiThinacerTainrange.AlThoughThequesTionofqueryresponseTimeisalsoimporTanT,iTisrelaTivelyeasyTosolve.AsahoTspoTinTheresearchfield,daTasTreamprocessinghasbeenexTensivelysTudied,andmanyalgoriThmshaveemerged.
OneideaTosolveTheconTradicTionbeTweenThehugeamounTofdaTainThedaTasTreamandThelimiTedsToragespaceisTousesampling.AnoTherideaisToconsTrucTasmalldaTasTrucTureThaTcanprovideapproximaTeresulTsTosTorecompressedDaTasTreamdaTa,ThissTrucTurecanbesToredinmemory.SkeTch,hisTogram,andwaveleTareacTuallyThemosTimporTanTThreeofsuchdaTasTrucTures.
InfacT,mosTofTheabovemeThodshavebeenusedinThefieldofTradiTionaldaTabases.TheproblemishowToapplyThemToThespecialenvironmenTofdaTaflow.
Случайна извадка
Случайна извадкаcancapTureThebasiccharacTerisTicsofadaTaseTbydrawingasmallnumberofsamples.AverycommonandsimplemeThodisuniformsampling.AsanalTernaTivesamplingmeThod,sTraTi.edsamplingcanreduceerrorscausedbyunevendaTadisTribuTion.However,forcomplexanalysis,ordinarysamplingalgoriThmssTillrequireToomuchspace.
ForsomespecialcalculaTionsofdaTasTreams,someinTeresTingsamplingalgoriThmshaveappeared.STickysampling[95]isusedforThecalculaTionoffrequenTiTems.ThemeThodofsTickysamplingisTosToreTheseTSformedbyTheTwo-Tuple(i,f)inThememory.ForeachpieceofdaTaThaTcomes,ifThekeyialreadyexisTsinS,Thecorrespondingfisincreasedby1;oTherwise,SamplingisperformedwiThaprobabiliTyof1r.IfThisiTemisselecTed,agroup(i,1)isaddedToS;afTeraperiodofTime,ThegroupinSisscannedonceandThevalueisupdaTed.ThenincreaseThevalueofr;aTTheend(orTheuserrequesTsTheresulT),ouTpuTallgroupsoff.(s-e)N.
ThedisTincTsampling[104]proposedbyP.GibbonsisusedfordisTincTcounTing,ThaTis,TofindThenumberofdifferenTvaluesinThedaTasTream.ITusesahashfuncTionTomapeachdifferenTvalueThaTarrivesToleveliwiThaprobabiliTyof2.(i+1);ifi≥memorylevelL(TheiniTialvalueofLis0),addiTTomemory,OTherwisediscard;whenThememoryisfull,deleTeThevalueoflevelLinThememory,andadd1ToL;ThefinalesTimaTeofThedisTincTcounTisThedifferenTvalueinThememorymulTipliedby2L.DisTincTcounTingisanoldproblemindaTabaseprocessing.TheadvanTageofThisalgoriThmisThaTbyseTTingappropriaTeparameTers,iTcanbeappliedToquerieswiThpredicaTes(ThaTis,disTincTcounTingisperformedonasubseTofThedaTasTream).
ThedisadvanTageofsamplingalgoriThmsisThaTTheyarenoTsensiTiveenoughToabnormaldaTa.Moreover,evenifTheycanbewellappliedTocommondaTaflowmodels,TheyneedTobemodifiedifTheyareTobeusedinslidingwindowmodels[91]orn-of-Nmodels[93].
SkeTchingofsTrucTure
SkeTchingrefersToTheuseofrandomprojecTionsToprojecTThedaTasTreaminToasmallsToragespaceasasummaryofTheenTiredaTasTream.ThesummarydaTasToredinspaceiscalledaThumbnail,whichcanbeusedToapproximaTeanswersTospecificqueries.DifferenTskeTchescanbeusedToesTimaTedifferenTLpnormsofThedaTasTream,andTheseLpnormscanbeusedToansweroTherTypesofqueries.Forexample,TheL0normcanbeusedToesTimaTedisTincTcounTsofdaTasTreams;TheL1normcanbeusedTocalculaTequanTilesandfrequenTiTems;TheL2normcanbeusedToesTimaTeThelengThofself-connecTions,andsoon.
TheconcepTofskeTcheswasfirsTproposedbyN.Alonin[105].SinceThen,variousskeTchesandTheirconsTrucTionalgoriThmshaveconTinuouslyemerged.
TherandomizedsTechingproposedbyN.Alonin[105]canbeusedforTheesTimaTionofdifferenTLpnorms,andrequiresaTmosTO(n1.lgn)space.ThemoreimporTanTconTribuTionofThispaperisThaTiTcanalsoesTimaTeL2wiThaspacerequiremenTofO(logn+logT).ITsmainideaisTouseahashfuncTionToconsisTenTlyandrandomlymapeachelemenTinThedomainDofThedaTaaTTribuTeTozi∈{.1+1},soThaTTherandomvariableX=.iαizi,X2canbeusedasEsTimaTeofL2norm.
p1
ThequanTileskeTchproposedbyS.GuhaeTal.[88]mainTainsaseTofdaTasTrucTureslike(vi,gi,Δi),rmax(vi)andrmin(vi)areThemaximumandminimumpossiblerankingsofvi,respecTively.Fori&gT;j:
vi&gT;vj
gi=rmin(vi).Rmin(vi.1)
Δi=rmax(vi).rmin(vi)
WiThThearrivalofThedaTa,updaTeTheouTlineaccordinglyTokeepTheesTimaTionwiThinacerTainaccuracy.X.LineTal.[93]gaveamoreformaldescripTionofThisproblem.
IfASisarandomseTexTracTedfrom[1..n],TheprobabiliTyofeachelemenTbeingexTracTedis1/2.A.GilberTeTal.[106]consTrucTseveralASs,andcallThesumofelemenTvaluesineachseTarandomsum.MulTiplerandomsumsmakeupaskeTch.TheesTimaTionofαiis
2E(||AS|||αi∈AS).||A||,where||A||isThesumofallThenumbersinThedaTasTream.Therefore,ThiskindofThumbnailcanbeusedToesTimaTeTheresulTofapoinTquery.UsingmulTiplesuchThumbnailscanbeusedforesTimaTionrangequery,quanTilequery,eTc.TheskeTchingTechniqueisacTuallyTheresulTofaTrade-offbeTweenspaceandaccuracy.InorderToensureThaTTheerrorofThepoinTqueryresulTislessThanεN,ThespacerequiredforTheaboveskeTchisusuallyε.2asThecoefficienT.IncomparisonwiThThis,TheCounT-MinSkeTchproposedbyG.CormodeeTal.[19]onlyneedsspaceforTheε.1coefficienT.TheideaisalsorelaTivelysimple.UseseveralhashfuncTionsToprojecTseparaTedaTasTreamsonTomulTiplesmallThumbnails.WhenansweringapoinTquery,eachThumbnailisansweredseparaTely,andThesmallesTvalueisselecTedasTheanswer.BasedonpoinTquery,counT-minimumouTlinecanbeusedforvariousoTherqueriesandcomplexcalculaTions.ThecounT-minimalskeTchdoesnoTcalculaTeTheLpnorm,buTdirecTlycalculaTesTheresulTofThepoinTquery,whichisoneofThereasonswhyiTsspace-TimeefficiencyishigherThanoTherskeTches.
HisTogram
ThehisTogram(hisTogram)hasTwomeanings:oneisahisTograminTheordinarysense,whichisavisualmeansfordisplayingapproximaTesTaTisTics;inaddiTion,iTITisalsoadaTasTrucTure/meThodThaTcapTuresTheapproximaTedisTribuTionofdaTa.WhenappearingasThelaTTer,ThehisTogramisconsTrucTedlikeThis:ThedaTaisdividedinTomulTipledisjoinTsubseTs(calledbuckeTs)accordingToiTsaTTribuTes,andThevaluesinThebuckeTsareapproximaTedinaunifiedway[107].
ThehisTogrammeThodismainlyusedforsignalprocessing,sTaTisTics,imageprocessing,compuTervisionanddaTabase.InThedaTabasefield,ThehisTogramwasoriginallymainlyusedforselecTiviTyesTimaTion,forselecTionqueryopTimizaTionandapproximaTequeryprocessing.HisTogramisoneofThesimplesTandmosTflexibleapproximaTeprocessingmeThods,andiTisalsoThemosTeffecTiveone.AslongasThedaTaupdaTeproblemissolved,TheoriginalhisTogramcanbeusedindaTasTreamprocessing.ThisTypeofhisTogramThaTisauTomaTicallyadjusTedaccordingToThenewdaTaiscalledadynamic(oradapTive/self-adjusTing)hisTogram.
ThehisTogramproposedbyL.FueTal.[108]ismainlyusedforThecalculaTionofThemedianfuncTion(Median)andoTherquanTilefuncTions.ITcanbeusedforapproximaTecalculaTionsandaccuraTequeries.ITusesDeTerminisTicBuckeTingandRandomizedBuckeTingTechnologiesToconsTrucTmulTiplebuckeTswiThdifferenTprecisions,andThendivideTheinpuTdaTainToThesebuckeTssTepbysTep,ThuscompleTingThedynamichisTogramsTrucTure.
BecauseiTisdifficulTTodirecTlyapplysTaTichisTogramsTodaTasTreamprocessing.S.GuhaeTal.[88]candynamicallyconsTrucTnear-opTimalV-opTimalhisTograms,buTTheycanonlybeappliedTodaTasTreamsunderTimeseriesmodels.
AcommonlyusedmeThodisTodivideTheenTirealgoriThminToTwosTeps:firsTconsTrucTaskeTchofThedaTaflowdaTa;ThenconsTrucTasuiTablehisTogramfromThisskeTch.ThismeThodcanTakeadvanTageofTheeasyupdaTeofTheThumbnaildaTaandrealizeThedynamicsofThehisTogram.N.ThapereTal.[109]firsTconsTrucTedaskeTchThaTapproximaTelyreflecTsThedaTasTreamdaTa,andusedTheexcellenTupdaTeperformanceofTheskeTchToupdaTeThedaTa,andThenderivedahisTogramfromThisskeTchToapproximaTeThedaTasTreamdaTa.SincederivingThebesThisTogramfromTheskeTchisanNP-hardproblem,TheauThorprovidesaheurisTicalgoriThm(greedyalgoriThm)TosearchforabeTTerhisTogram.
A.GilberTeTal.[110]consTrucTedasummarydaTasTrucTureThaTusesaseTofrandomandsTrucTuresimilarToThoseinTheliTeraTure[106]TosToreThevaluesofdyadicinTervalaTdifferenTgranulariTylevels.SubsequenTly,ThedyadicinTerval([111])ofdifferenTgranulariTylevelsisaddedToThehisTogramTobeconsTrucTedfromlargeTosmall,soasTominimizeTheapproximaTeerror(refinemenT).
A.GilberTeTal.[112]mainlyconsideredhowToreduceTheprocessingcomplexiTyofeachinpuTdaTainThedaTasTream.TheyfirsTconverTedTheinpuTdaTainTowaveleTcoefficienTs(usingThewaveleTcoefficienTsasTheinnerproducTofThesignalandThebasisvecTor),andThenadopTedadyadicinTervalprocessingmeThodsimilarToTheliTeraTure[110].TheskeTchiscloselyrelaTedToThehisTogram.FromacerTainperspecTive,ThehisTogramcanberegardedasaspecialcaseofTheskeTch.
WaveleTTransformaTion
WaveleTTransformaTion(waveleTTransformaTion)isofTenusedTogeneraTesummaryinformaTionofdaTa.ThisisbecauseusuallyonlyasmallparTofThewaveleTcoefficienTsisimporTanT,andmosTofThecoefficienTsareeiTherverysmallorunimporTanT.Therefore,ifyouignoreTheunimporTanTcoefficienTsgeneraTedbyThedaTaafTerThewaveleTTransform,youcanuseveryliTTlespaceTocompleTeTheapproximaTionofTheoriginaldaTa.
Y.MaTiaseTal.firsTconsTrucTedahisTogramforThedaTasTreamdaTaandsimulaTediTwiThwaveleT.SubsequenTly,someofThemosTimporTanTwaveleTcoefficienTsarereTainedTosimulaTeThehisTogram.WhennewdaTaappears,ThehisTogramisupdaTedbyupdaTingThesewaveleTcoefficienTs.
WhaTTheliTeraTureproposesisacTuallyahisTogrammeThod,buTiTuseswaveleTTransform.A.GilberTeTal.poinTedouTThaTThewaveleTTransformcanbeconsideredasTheinnerproducTofasignalandaseToforThogonalvecTorsoflengThN.Therefore,aseTofdaTasTreamdaTaouTlinesareconsTrucTed.BecauseTheouTlinescancalculaTeThesignalandaseTofdaTaeasilyandaccuraTely.TheinnerproducTofThegroupvecTorcanThenbeusedTocalculaTeThewaveleTcoefficienTsfromTheskeTch,whichcanbeusedforpoinTqueryandrangequeryesTimaTion.
Нови тенденции
ResearchershaveconTinuedTodeepenTheirresearchondaTasTreamprocessing.WebelieveThaTThefollowingnewTrendshaveemerged:
FuTureskeTches
b&gT;InTroducemoresTaTisTics
CalculaTionTechniquesToconsTrucTskeTches
G.CormodeandoThersmainlydealwiThThecalculaTionoffrequenTiTems.ITisbasedonThepreviousmajoriTemalgoriThm([116,117])anduseserror-correcTingcodesTodealwiThproblems.Forexample,acounTerisseTupforeachbiTofThedaTa,andThenThefrequenTiTemseTisinferredbasedonThecounTingresulTsofThesecounTers.
Y.TaoeTal.[118]isessenTiallyanapplicaTionofProbabilisTiccounTing(disTincTcounTingThaThasbeenwidelyusedinThedaTabasefield)indaTasTreamprocessing.
ExpandingTheskeTchmap
ExTendTheskeTchmapTodealwiThmorecomplexqueries.
LineTal.inTheliTeraTure[93]consTrucTedacomplexskeTchsysTemThaTcanbeusedToesTimaTeThequanTileofTheslidingwindowmodelandThen-of-Nmodel,whichisdifficulTToachievewiThsimpleskeTches.
UnderTheslidingwindowmodel,liTeraTure[93]dividesThedaTainTomulTiplebuckeTsinchronologicalorder,esTablishesThumbnailsineachbuckeT(TheaccuracyishigherThanrequired),andThencombinesTheseThumbnailsduringqueryMerge,whereThelasTbuckeTmayneedTobelifTed.DuringmainTenance,onlyexpiredbuckeTsaredeleTedandnewbuckeTsareadded.
InThen-of-Nmodel,liTeraTure[93]dividesThedaTainTomulTiplebuckeTsofdifferenTsizesaccordingToTheEHParTiTioningTechnique,andbuildsaskeTchineachbuckeT(TheaccuracyishigherThanrequired),ThenmergesomeofTheThumbnailsduringThequeryToensureTherequiredaccuracy,andThelasTonemayneedTobeimproved.
CombinespaTioTemporaldaTa
FurThercombinaTionwiThspaTioTemporaldaTaprocessing:
J.SuneTal.[120]MainlyforhisToricalqueryandpredicTionprocessingofspaTio-TemporaldaTa.However,ThearTicleemphasizesThaTspaTio-TemporaldaTaappearsinTheformofdaTasTreams,andTheprocessingalsofocusesmoreonTheupdaTeperformanceofspaTio-TemporaldaTa.
Y.TaoeTal.[118]useThedaTasTreammeThodToprocessspaTio-TemporaldaTa.ByconsTrucTingaskeTchofThedynamicspaTio-TemporaldaTa,iTisusedTodisTinguishwheTherTheobjecTismovingorsTaTionaryamongmulTipleregions,andesTimaTeITsnumber.BuTThiskindofproblemisdifficulTTosolveinTheoriginalTimeandspaceprocessing.
нов жанр
ThedaTasTreamofonlinenovelsisanemerginggenre,whichmeansThaTTheproTagonisT'ssTrengThisdigiTized,andThedaTadisplayedisThesameasTheaTTribuTebarofonlinegames.