Generovat pozadí
Thedevelopmentofdataflowapplicationsistheresultofthefollowingtwofactors:
Podrobné údaje
IthasbeenabletocontinuetoautomaticallyGeneratealotofdetaileddata.Thistypeofdatafirstappearedinthetraditionalbankingandstocktradingfields,andlateralsoappearedingeologicalsurveys,meteorology,astronomicalobservations,etc.Inparticular,theemergenceoftheInternet(networktrafficmonitoring,clickstream)andwirelesscommunicationnetworks(callrecords)hasproducedalargeamountofdatastreamtypedata.Wehavenoticedthatmostofthiskindofdataisrelatedtogeographicinformation.Thisismainlyduetothelargedimensionsofgeographicinformationanditiseasytogeneratesuchalargeamountofdetaileddata.
Komplexní analýza
Itisnecessarytoperformcomplexanalysisontheupdatestreaminanearreal-timemanner.Komplexní analýza(suchastrendanalysis,forecasting)ofthedataintheabovefieldsisoftendoneoffline(inthedatawarehouse),butsomenewapplications(especiallyinthefieldofnetworksecurityandnationalsecurity)areverytime-sensitive,Suchasthedetectionofextremeevents,fraud,intrusion,anomaliesontheInternet,complexcrowdmonitoring,tracktrend,exploratoryanalyses,harmonicanalysis,etc.,allrequireonlineanalysis.
Afterthis,theacademiccommunitybasicallyrecognizedthisdefinition,andsomearticlesalsoslightlymodifiedthedefinitiononthisbasis.Forexample,S.Guhaetal.[88]believethatadatastreamisan"orderedsequenceofpointsthatcanonlybereadonceorafewtimes",andhererelaxesthe"onepass"restrictioninthepreviousdefinition.
Whydoyouemphasizethelimitationonthenumberofdatareadsintheprocessingofdatastreams?S.Muthukrishnan[89]pointedoutthatdatastreamrefersto"inputdataarrivingataveryhighspeed",sothetransmission,calculationandstorageofdatastreamdatawillbecomeverydifficult.Inthiscase,thereisonlyachancetoprocessthedataoncewhenitfirstarrives,anditisdifficulttoaccessthedataatothertimes(becausethereisnosuchdataanditisimpossibletosaveit).
Rozlišovací vlastnosti
Rozdíly od tradičního relačního datového modelu
B.Babcocketal.[90]believethatthedataflowmodelisasfollowsSeveralaspectsaredifferentfromthetraditionalrelationaldatamodel:
1.Data přicházejí po lince;
2.Theprocessingsystemcannotcontrolthearrivalorderoftheprocesseddata;
3.data mohou být neomezená;
4.Duetothehugeamountofdata,theelementsinthedatastreamwillbediscardedorarchivedafterbeingprocessed.Itwillbedifficulttoobtainthesedatainthefutureunlessthedataisstoredinmemory,butsincethesizeofthememoryisusuallymuchsmallerthantheamountofdatainthedatastream,thedataisusuallyonlyobtainedwhenthedataarrivesforthefirsttime.
Tři vlastnosti
Webelievethatthecurrentresearchondataflowcalculationisdifferentfromthetraditionalcalculationmodel,thekeyliesinthedataflowdataitselfIthasthefollowingthreecharacteristics:
Dataarrival – rychlý
Thismeansthattheremaybealargeamountofinputdatatobeprocessedinashorttime.Thisisabigburdenontheprocessorandinputandoutputdevices,sotheprocessingofthedatastreamshouldbeassimpleaspossible.
Rozsah dat – široká oblast
Thismeansthatthevaluerangeofthedataattribute(dimension)isverylarge,andtherearemanypossiblevalues,suchasRegion,mobilephonenumber,person,networknode,etc.Thisisthemainreasonwhythedatastreamcannotbestoredinthememoryorharddisk.Ifthedimensionissmall,eveniftheamountofincomingdataislarge,thedatacanbestoredinasmallermemory.Forexample,forawirelesscommunicationnetwork,ifthereareonly1,000usersforthesame1millioncallrecords,then1,000storageunitscansaveenoughandaccurateenoughdatatoanswer"ThecumulativecalltimeofacertainuserisHowlongistheproblem?Ifthereare100,000usersintotal,100,000storageunitsareneededtostorethisinformation.Theattributesofdatastreamdataaremostlyrelatedtogeographicinformation,IPaddresses,mobilephonenumbers,etc.,andareoftenassociatedwithtime.Atthistime,thedimensionalityofthedatafarexceedsthecapacityofthememoryandharddisk,whichmeansthatthesystemcannotcompletelystorethisinformation,andusuallycanonlyaccessthedataoncewhenthedataarrives.
Timeofdataarrival—pokračování
Thecontinuousarrivalofdatameansthattheamountofdatamaybeunlimited.Moreover,theresultofprocessingthedatawillnotbethefinalresult,becausethedatawillcontinuetoarrive.Therefore,theresultofthequeryonthedatastreamisoftennotone-timebutcontinuous,thatis,thelatestresultiscontinuouslyreturnedastheunderlyingdataarrives.
Thecharacteristicsoftheabovedatastreamdeterminethecharacteristicsofdatastreamprocessing:oneaccess,continuousprocessing,limitedstorage,approximateresults,andfastresponse.
Theapproximateresultisaninevitableresultproducedundertheconstraintsofthefirstthreeconditions.Sincethedatacanonlybeaccessedonce,andthereisonlyarelativelysmalllimitedspacetostorethedata,itisusuallyimpossibletogenerateaccuratecalculationresults.Afterchangingtherequirementsforresultsfrom"precise"to"approximate"inthepast,itbecomespossibletoachieverapidresponsetodatastreamqueries.
Klasifikace
Thenatureandformatofthedataaredifferent,andtheprocessingmethodofthestreamisalsodifferent.Therefore,intheJavainput/outputclasslibrary,therearedifferentstreamclassestocorrespondtodifferentNatureoftheinput/outputstream.Injava.Intheiopackage,thebasicinput/outputstreamcanbedividedintotwotypesaccordingtothetypeofreadandwritedata:bytestreamandcharacterstream.
Vstupní a výstupní proud
Datastreamisdividedintoinputstream(InputStream)andoutputstream(OutputStream).Theinputstreamcanonlybereadbutnotwritten,andtheoutputstreamcanonlybewrittenbutnotread.Usuallytheprogramusestheinputstreamtoreaddataandtheoutputstreamtowritedata,justasdataflowsintoandoutoftheprogram.Theuseofdataflowmakestheinputandoutputoperationsoftheprogramindependentofrelatedequipment.
Theinputstreamcangetdatafromthekeyboardorfile,andtheoutputstreamcantransmitdatatothemonitor,printerorfile.
BufferedStream
Inordertoimprovetheefficiencyofdatatransmission,BufferedStreamisusuallyused,thatis,astreamisequippedwithabuffer(buffer),andabufferisdedicatedThememoryblockusedtotransferdata.Whenwritingdatatoabufferstream,thesystemdoesnotdirectlysendtotheexternaldevice,butsendsthedatatothebuffer.Thebufferautomaticallyrecordsdata.Whenthebufferisfull,thesystemsendsallthedatatothecorrespondingdevice.
Whenreadingdatafromabufferstream,thesystemactuallyreadsthedatafromthebuffer.Whenthebufferisempty,thesystemwillautomaticallyreaddatafromtherelevantdeviceandreadasmuchdataaspossibletofillthebuffer.
Popis modelu
Wetrytosummarizeanddescribethedataflowmodelfromthreedifferentaspects:datacollection,datové atributy,andcalculationtypes.Infact,manyarticleshaveproposedavarietyofdataflowmodels.Wedidnotincludeallthesemodels,butsummarizedandclassifiedthemoreimportantandcommonones.
Formalizace
Následující je formální popis toku dat.
Zvažte vektorα,jehopřiřazenoudoménuje[1..n](rankisn),astav vektorαčasu
a(t)=
Občas αisazervector,to je,αi(s)=0forallli.Aktualizacekaždésoučástivektorujeve tvaru proudu dvou n-tic.To znamená, že tthupdateje(i,ct), což znamená, žeαi(t)=αi(t.1)+ct,andfori.=.i,αi. )=αi.(t.1).Dotaz, který nastane v čase je pro α(t).
Sběr dat
Wefirstconsiderwhatdataisincludedinthecalculationrangewhenperformingdataflowcalculations.Regardingthisissue,therearemainlythreedifferentmodels:datastreammodel,slidingwindowmodelandn-of-Nmodel.
Model toku dat (model toku dat) V modelu toku dat musí být všechna data z určitého času zahrnuta do rozsahu výpočtu. V tuto chvíli, s=0, to je, v čase 0, α je 0 vektor. To znamená, že toto je původní a nejběžnější model toku dat.
Slidingwindowmodel(computingthemostrecentNdata)Theslidingwindowmodelmeansthat,countingfromthetimeofcalculation,theforwardNdatamustbeincludedinthecalculationrange.Atthistime,s=t.N,thatis,attimet.N,αisazerovector.Inotherwords,tocalculatethemostrecentNdata.Sincethedataofthedatastreamisconstantlyemerging,sointuitively,thismodeislikeusingaconstantwindow,thedatapassesthroughthewindowwiththepassageoftime,andthedatainthewindowisthecalculateddataset.M.Dataretal.[91]firstproposedthismodel,andthenreceivedawiderangeofresponses[92].
n-z-Nmodelu(vypočítejte nejposlednější údaje, mezi nimiž0
datové atributy
Charakteristika samotných dat:
Časové řady (model časových sérií)Data přicházejí v objednávce podle atributů (ve skutečnosti času). V tomto případě i=t, to znamená, že aktualizace v čase je (t, ct). 1).Tento model je vhodný pro data časových řad, jako jsou výstupní data konkrétní IP, nebo periodicky aktualizovaná data zásob atd.
Cashregistermodel(cashregistermodel)Thedataofthesameattributeisadded,andthedataispositive.Inthismodel,ct>=0.ThismeansForalliandt,αi(t)isalwaysnotlessthanzeroandisincreasing.Infact,thismodelisconsideredtobethemostcommonlyused,forexample,itcanbeusedforcashregister(cashregister)Themodelgetsitsname),thenetworktransmissionvolumeofeachIP,themonitoringofthecalldurationofmobilephoneusers,andsoon.
Theturnstilemodel(turnstilemodel)Thedataofthesameattributeisadded,andthedataispositiveorNegative.Inthismodel,ctcanbegreaterthan0orlessthan0.Thisisthemostcommonmodel.S.Muthukrishnan[89]calleditaturnstilemodelbecausethefunctionofthismodelislikethecrossofasubwaystation.Turnstilescanbeusedtocalculatehowmanypeoplehavearrivedandleft,andthusthenumberofpeopleinthesubway.
Typy výpočtů
Thecalculationofdatastreamdatacanbedividedintotwocategories:Basiccalculationsandcomplexcalculations.Basiccalculationsmainlyincludepointquery,rangequeryandinnerproductquery.Complexcalculationsincludequantilecalculation,frequentitemcalculation,anddatamining.
Bodový dotaz vracíhodnotuαi(t).
RangequeryForrangequeryQ(f,t),návrat
t
.αi(t)
i=f
Vnitřní produktForvektorβ,vnitřníproduktαaβ
α.β=Σni=1αi(t)βi
Quantile(Quantile)Givenasequencenumberr,returnthevaluev,andensurethattherealrankrofvinαmeetsthefollowingrequirements:
r.εN≤r.≤r+εN
Mezi nimi,εjepřesnost,N=Σni=1αi(t).
GSMankuetc.[94]providesaframeworkstructureforapproximateestimationofquantilesthroughascan,andtreatsthedatasetasthenodesofthetree.Thesenodeshavedifferentweights(suchasthenumberofdatacontainedinthenode).Itisbelievedthatallquantileestimationalgorithmscanbeconsideredtobecomposedofthreeoperationsonnodestogeneratenewnodes(NEW),merge(COLLAPSE)andoutput(OUTPUT).Differentstrategiesconstitutedifferenttypesoftrees.Thisframeworkstructurebecamethebasisofmanysubsequentquantileestimationalgorithms.
FrequentitemsaresometimescalledHeavyhitters,whichmeansfindingitemsthatfrequentlyappearinthedatastream.Inthiscalculation,actuallyletct=1.Inthisway,αi(t)storesthearrivalfrequencyofdatawhosedimensionvalueisequaltoiasoftimet.Thequeryofthesedatacanbedividedintotwotypes:
Najděte první km nejčastěji se vyskytující položky
Najděte všechny položky s frekvencí větší než 1/k
>Theresearchonthefrequencytermmainlyfocusesonthelattercalculation[95].
Těžba Těžba datového toku dat zahrnuje složitější výpočty. Výzkum v této oblasti zahrnuje: vícerozměrnou analýzu[96], klasifikační analýzu[97,98], shlukovou analýzu[99–102] a další jeden-passalgoritmy[103].
Související nápady
Úvod
Themaindifficultyindatastreamprocessingishowtocontrolthespacespentstoringdatawithinacertainrange.Althoughthequestionofqueryresponsetimeisalsoimportant,itisrelativelyeasytosolve.Asahotspotintheresearchfield,datastreamprocessinghasbeenextensivelystudied,andmanyalgorithmshaveemerged.
Oneideatosolvethecontradictionbetweenthehugeamountofdatainthedatastreamandthelimitedstoragespaceistousesampling.AnotherideaistoconstructasmalldatastructurethatcanprovideapproximateresultstostorecompressedDatastreamdata,thisstructurecanbestoredinmemory.Sketch,histogram,andwaveletareactuallythemostimportantthreeofsuchdatastructures.
Infact,mostoftheabovemethodshavebeenusedinthefieldoftraditionaldatabases.Theproblemishowtoapplythemtothespecialenvironmentofdataflow.
Náhodné vzorkování
Náhodné vzorkovánícancapturethebasiccharacteristicsofadatasetbydrawingasmallnumberofsamples.Averycommonandsimplemethodisuniformsampling.Asanalternativesamplingmethod,strati.edsamplingcanreduceerrorscausedbyunevendatadistribution.However,forcomplexanalysis,ordinarysamplingalgorithmsstillrequiretoomuchspace.
Forsomespecialcalculationsofdatastreams,someinterestingsamplingalgorithmshaveappeared.Stickysampling[95]isusedforthecalculationoffrequentitems.ThemethodofstickysamplingistostorethesetSformedbythetwo-tuple(i,f)inthememory.Foreachpieceofdatathatcomes,ifthekeyialreadyexistsinS,thecorrespondingfisincreasedby1;otherwise,Samplingisperformedwithaprobabilityof1r.Ifthisitemisselected,agroup(i,1)isaddedtoS;afteraperiodoftime,thegroupinSisscannedonceandthevalueisupdated.Thenincreasethevalueofr;attheend(ortheuserrequeststheresult) ,outputallgroupsoff.(s-e)N.
Thedistinctsampling[104]proposedbyP.Gibbonsisusedfordistinctcounting,thatis,tofindthenumberofdifferentvaluesinthedatastream.Itusesahashfunctiontomapeachdifferentvaluethatarrivestoleveliwithaprobabilityof2.(i+1);ifi≥memorylevelL(theinitialvalueofLis0),addittomemory,Otherwisediscard;whenthememoryisfull,deletethevalueoflevelLinthememory,andadd1toL;thefinalestimateofthedistinctcountisthedifferentvalueinthememorymultipliedby2L.Distinctcountingisanoldproblemindatabaseprocessing.Theadvantageofthisalgorithmisthatbysettingappropriateparameters,itcanbeappliedtoquerieswithpredicates(thatis,distinctcountingisperformedonasubsetofthedatastream).
Thedisadvantageofsamplingalgorithmsisthattheyarenotsensitiveenoughtoabnormaldata.Moreover,eveniftheycanbewellappliedtocommondataflowmodels,theyneedtobemodifiediftheyaretobeusedinslidingwindowmodels[91]orn-of-Nmodels[93].
Skicování konstrukce
Sketchingreferstotheuseofrandomprojectionstoprojectthedatastreamintoasmallstoragespaceasasummaryoftheentiredatastream.Thesummarydatastoredinspaceiscalledathumbnail,whichcanbeusedtoapproximateanswerstospecificqueries.DifferentsketchescanbeusedtoestimatedifferentLpnormsofthedatastream,andtheseLpnormscanbeusedtoanswerothertypesofqueries.Forexample,theL0normcanbeusedtoestimatedistinctcountsofdatastreams;theL1normcanbeusedtocalculatequantilesandfrequentitems;theL2normcanbeusedtoestimatethelengthofself-connections,andsoon.
TheconceptofsketcheswasfirstproposedbyN.Alonin[105].Sincethen,varioussketchesandtheirconstructionalgorithmshavecontinuouslyemerged.
TherandomizedstechingproposedbyN.Alonin[105]canbeusedfortheestimationofdifferentLpnorms,andrequiresatmostO(n1.lgn)space.ThemoreimportantcontributionofthispaperisthatitcanalsoestimateL2withaspacerequirementofO(logn+logt).ItsmainideaistouseahashfunctiontoconsistentlyandrandomlymapeachelementinthedomainDofthedataattributetozi∈{.1+1},sothattherandomvariableX=.iαizi,X2canbeusedasEstimateofL2norm.
p1
ThequantilesketchproposedbyS.Guhaetal.[88]maintainsasetofdatastructureslike(vi,gi,Δi),rmax(vi)andrmin(vi)arethemaximumandminimumpossiblerankingsofvi,respectively.Fori>j:
vi>vj
gi=rmin(vi).Rmin(vi.1)
Δi=rmax(vi).rmin(vi)
Withthearrivalofthedata,updatetheoutlineaccordinglytokeeptheestimationwithinacertainaccuracy.X.Linetal.[93]gaveamoreformaldescriptionofthisproblem.
IfASisarandomsetextractedfrom[1..n],theprobabilityofeachelementbeingextractedis1/2.A.Gilbertetal.[106]constructseveralASs,andcallthesumofelementvaluesineachsetarandomsum.Multiplerandomsumsmakeupasketch.Theestimationofαiis
2E(||AS|||αi∈AS).||A||,where||A||isthesumofallthenumbersinthedatastream.Therefore,thiskindofthumbnailcanbeusedtoestimatetheresultofapointquery.Usingmultiplesuchthumbnailscanbeusedforestimationrangequery,quantilequery,etc.Thesketchingtechniqueisactuallytheresultofatrade-offbetweenspaceandaccuracy.InordertoensurethattheerrorofthepointqueryresultislessthanεN,thespacerequiredfortheabovesketchisusuallyε.2asthecoefficient.Incomparisonwiththis,theCount-MinSketchproposedbyG.Cormodeetal.[19]onlyneedsspacefortheε.1coefficient.Theideaisalsorelativelysimple.Useseveralhashfunctionstoprojectseparatedatastreamsontomultiplesmallthumbnails.Whenansweringapointquery,eachthumbnailisansweredseparately,andthesmallestvalueisselectedastheanswer.Basedonpointquery,count-minimumoutlinecanbeusedforvariousotherqueriesandcomplexcalculations.Thecount-minimalsketchdoesnotcalculatetheLpnorm,butdirectlycalculatestheresultofthepointquery,whichisoneofthereasonswhyitsspace-timeefficiencyishigherthanothersketches.
Histogram
Thehistogram(histogram)hastwomeanings:oneisahistogramintheordinarysense,whichisavisualmeansfordisplayingapproximatestatistics;inaddition,itItisalsoadatastructure/methodthatcapturestheapproximatedistributionofdata.Whenappearingasthelatter,thehistogramisconstructedlikethis:thedataisdividedintomultipledisjointsubsets(calledbuckets)accordingtoitsattributes,andthevaluesinthebucketsareapproximatedinaunifiedway[107].
Thehistogrammethodismainlyusedforsignalprocessing,statistics,imageprocessing,computervisionanddatabase.Inthedatabasefield,thehistogramwasoriginallymainlyusedforselectivityestimation,forselectionqueryoptimizationandapproximatequeryprocessing.Histogramisoneofthesimplestandmostflexibleapproximateprocessingmethods,anditisalsothemosteffectiveone.Aslongasthedataupdateproblemissolved,theoriginalhistogramcanbeusedindatastreamprocessing.Thistypeofhistogramthatisautomaticallyadjustedaccordingtothenewdataiscalledadynamic(oradaptive/self-adjusting)histogram.
ThehistogramproposedbyL.Fuetal.[108]ismainlyusedforthecalculationofthemedianfunction(Median)andotherquantilefunctions.Itcanbeusedforapproximatecalculationsandaccuratequeries.ItusesDeterministicBucketingandRandomizedBucketingtechnologiestoconstructmultiplebucketswithdifferentprecisions,andthendividetheinputdataintothesebucketsstepbystep,thuscompletingthedynamichistogramstructure.
Becauseitisdifficulttodirectlyapplystatichistogramstodatastreamprocessing.S.Guhaetal.[88]candynamicallyconstructnear-optimalV-optimalhistograms,buttheycanonlybeappliedtodatastreamsundertimeseriesmodels.
Acommonlyusedmethodistodividetheentirealgorithmintotwosteps:firstconstructasketchofthedataflowdata;thenconstructasuitablehistogramfromthissketch.Thismethodcantakeadvantageoftheeasyupdateofthethumbnaildataandrealizethedynamicsofthehistogram.N.Thaperetal.[109]firstconstructedasketchthatapproximatelyreflectsthedatastreamdata,andusedtheexcellentupdateperformanceofthesketchtoupdatethedata,andthenderivedahistogramfromthissketchtoapproximatethedatastreamdata.SincederivingthebesthistogramfromthesketchisanNP-hardproblem,theauthorprovidesaheuristicalgorithm(greedyalgorithm)tosearchforabetterhistogram.
A.Gilbertetal.[110]constructedasummarydatastructurethatusesasetofrandomandstructuresimilartothoseintheliterature[106]tostorethevaluesofdyadicintervalatdifferentgranularitylevels.Subsequently,thedyadicinterval([111])ofdifferentgranularitylevelsisaddedtothehistogramtobeconstructedfromlargetosmall,soastominimizetheapproximateerror(refinement).
A.Gilbertetal.[112]mainlyconsideredhowtoreducetheprocessingcomplexityofeachinputdatainthedatastream.Theyfirstconvertedtheinputdataintowaveletcoefficients(usingthewaveletcoefficientsastheinnerproductofthesignalandthebasisvector),andthenadoptedadyadicintervalprocessingmethodsimilartotheliterature[110].Thesketchiscloselyrelatedtothehistogram.Fromacertainperspective,thehistogramcanberegardedasaspecialcaseofthesketch.
WaveletTransformation
Wavelettransformation(wavelettransformation)isoftenusedtogeneratesummaryinformationofdata.Thisisbecauseusuallyonlyasmallpartofthewaveletcoefficientsisimportant,andmostofthecoefficientsareeitherverysmallorunimportant.Therefore,ifyouignoretheunimportantcoefficientsgeneratedbythedataafterthewavelettransform,youcanuseverylittlespacetocompletetheapproximationoftheoriginaldata.
Y.Matiasetal.firstconstructedahistogramforthedatastreamdataandsimulateditwithwavelet.Subsequently,someofthemostimportantwaveletcoefficientsareretainedtosimulatethehistogram.Whennewdataappears,thehistogramisupdatedbyupdatingthesewaveletcoefficients.
Whattheliteratureproposesisactuallyahistogrammethod,butituseswavelettransform.A.Gilbertetal.pointedoutthatthewavelettransformcanbeconsideredastheinnerproductofasignalandasetoforthogonalvectorsoflengthN.Therefore,asetofdatastreamdataoutlinesareconstructed.Becausetheoutlinescancalculatethesignalandasetofdataeasilyandaccurately.Theinnerproductofthegroupvectorcanthenbeusedtocalculatethewaveletcoefficientsfromthesketch,whichcanbeusedforpointqueryandrangequeryestimation.
NewTrends
Researchershavecontinuedtodeepentheirresearchondatastreamprocessing.Webelievethatthefollowingnewtrendshaveemerged:
Futuresketches
b>Představte další statistiky
Výpočtová technikapokonstrukční skici
G.Cormodeandothersmainlydealwiththecalculationoffrequentitems.Itisbasedonthepreviousmajoritemalgorithm([116,117])anduseserror-correctingcodestodealwithproblems.Forexample,acounterissetupforeachbitofthedata,andthenthefrequentitemsetisinferredbasedonthecountingresultsofthesecounters.
Y.Taoetal.[118]isessentiallyanapplicationofProbabilisticcounting(distinctcountingthathasbeenwidelyusedinthedatabasefield)indatastreamprocessing.
Rozbalení mapy skici
Rozšiřte mapu náčrtu, abyste si poradili se složitějšími dotazy.
Linetal.intheliterature[93]constructedacomplexsketchsystemthatcanbeusedtoestimatethequantileoftheslidingwindowmodelandthen-of-Nmodel,whichisdifficulttoachievewithsimplesketches.
Undertheslidingwindowmodel,literature[93]dividesthedataintomultiplebucketsinchronologicalorder,establishesthumbnailsineachbucket(theaccuracyishigherthanrequired),andthencombinesthesethumbnailsduringqueryMerge,wherethelastbucketmayneedtobelifted.Duringmaintenance,onlyexpiredbucketsaredeletedandnewbucketsareadded.
Inthen-of-Nmodel,literature[93]dividesthedataintomultiplebucketsofdifferentsizesaccordingtotheEHPartitioningtechnique,andbuildsasketchineachbucket(theaccuracyishigherthanrequired),Thenmergesomeofthethumbnailsduringthequerytoensuretherequiredaccuracy,andthelastonemayneedtobeimproved.
Kombinuje časoprostorová data
Furthercombinationwithspatiotemporaldataprocessing:
J.Sunetal.[120]Mainlyforhistoricalqueryandpredictionprocessingofspatio-temporaldata.However,thearticleemphasizesthatspatio-temporaldataappearsintheformofdatastreams,andtheprocessingalsofocusesmoreontheupdateperformanceofspatio-temporaldata.
Y.Taoetal.[118]usethedatastreammethodtoprocessspatio-temporaldata.Byconstructingasketchofthedynamicspatio-temporaldata,itisusedtodistinguishwhethertheobjectismovingorstationaryamongmultipleregions,andestimateItsnumber.Butthiskindofproblemisdifficulttosolveintheoriginaltimeandspaceprocessing.
románový žánr
Thedatastreamofonlinenovelsisanemerginggenre,whichmeansthattheprotagonist'sstrengthisdigitized,andthedatadisplayedisthesameastheattributebarofonlinegames.