Generarebackground
Thedevelopmentofdataflowapplicationsistheresultofthefollowingtwofactors:
Detaileddata
IthasbeenabletocontinuetoautomaticallyGeneratealotofdetaileddata.Thistypeofdatafirstappearedinthetraditionalbankingandstocktradingfields,andlateralsoappearedingeologicalsurveys,meteorology,astronomicalobservations,etc.Inparticular,theemergenceoftheInternet(networktrafficmonitoring,clickstream)andwirelesscommunicationnetworks(callrecords)hasproducedalargeamountofdatastreamtypedata.Wehavenoticedthatmostofthiskindofdataisrelatedtogeographicinformation.Thisismainlyduetothelargedimensionsofgeographicinformationanditiseasytogeneratesuchalargeamountofdetaileddata.
Complexanalysis
Itisnecessarytoperformcomplexanalysisontheupdatestreaminanearreal-timemanner.Complexanalysis(suchastrendanalysis,forecasting)ofthedataintheabovefieldsisoftendoneoffline(inthedatawarehouse),butsomenewapplications(especiallyinthefieldofnetworksecurityandnationalsecurity)areverytime-sensitive,Suchasthedetectionofextremeevents,fraud,intrusion,anomaliesontheInternet,complexcrowdmonitoring,tracktrend,exploratoryanalyses,harmonicanalysis,etc.,allrequireonlineanalysis.
Afterthis,theacademiccommunitybasicallyrecognizedthisdefinition,andsomearticlesalsoslightlymodifiedthedefinitiononthisbasis.Forexample,S.Guhaetal.[88]believethatadatastreamisan"orderedsequenceofpointsthatcanonlybereadonceorafewtimes",andhererelaxesthe"onepass"restrictioninthepreviousdefinition.
Whydoyouemphasizethelimitationonthenumberofdatareadsintheprocessingofdatastreams?S.Muthukrishnan[89]pointedoutthatdatastreamrefersto"inputdataarrivingataveryhighspeed",sothetransmission,calculationandstorageofdatastreamdatawillbecomeverydifficult.Inthiscase,thereisonlyachancetoprocessthedataoncewhenitfirstarrives,anditisdifficulttoaccessthedataatothertimes(becausethereisnosuchdataanditisimpossibletosaveit).
Distinguingfeatures
Differentiae fromthetraditionalrelationaldatamodel
B.Babcocketal.[90]believethatthedataflowmodelisasfollowsSeveralaspectsaredifferentfromthetraditionalrelationaldatamodel:
1.Thedataarrivesonline;
2.Theprocessingsystemcannotcontrolthearrivalorderoftheprocesseddata;
3.Thedatamaybeunlimited;
4.Duetothehugeamountofdata,theelementsinthedatastreamwillbediscardedorarchivedafterbeingprocessed.Itwillbedifficulttoobtainthesedatainthefutureunlessthedataisstoredinmemory,butsincethesizeofthememoryisusuallymuchsmallerthantheamountofdatainthedatastream,thedataisusuallyonlyobtainedwhenthedataarrivesforthefirsttime.
Threecharacteristics
Webelievethatthecurrentresearchondataflowcalculationisdifferentfromthetraditionalcalculationmodel,thekeyliesinthedataflowdataitselfIthasthefollowingthreecharacteristics:
Dataarrival-celeriter
Thismeansthattheremaybealargeamountofinputdatatobeprocessedinashorttime.Thisisabigburdenontheprocessorandinputandoutputdevices,sotheprocessingofthedatastreamshouldbeassimpleaspossible.
Therangeofdata-widearea
Thismeansthatthevaluerangeofthedataattribute(dimension)isverylarge,andtherearemanypossiblevalues,suchasRegion,mobilephonenumber,person,networknode,etc.Thisisthemainreasonwhythedatastreamcannotbestoredinthememoryorharddisk.Ifthedimensionissmall,eveniftheamountofincomingdataislarge,thedatacanbestoredinasmallermemory.Forexample,forawirelesscommunicationnetwork,ifthereareonly1,000usersforthesame1millioncallrecords,then1,000storageunitscansaveenoughandaccurateenoughdatatoanswer"ThecumulativecalltimeofacertainuserisHowlongistheproblem?Ifthereare100,000usersintotal,100,000storageunitsareneededtostorethisinformation.Theattributesofdatastreamdataaremostlyrelatedtogeographicinformation,IPaddresses,mobilephonenumbers,etc.,andareoftenassociatedwithtime.Atthistime,thedimensionalityofthedatafarexceedsthecapacityofthememoryandharddisk,whichmeansthatthesystemcannotcompletelystorethisinformation,andusuallycanonlyaccessthedataoncewhenthedataarrives.
Timeofdataarrival, continuatio
Thecontinuousarrivalofdatameansthattheamountofdatamaybeunlimited.Moreover,theresultofprocessingthedatawillnotbethefinalresult,becausethedatawillcontinuetoarrive.Therefore,theresultofthequeryonthedatastreamisoftennotone-timebutcontinuous,thatis,thelatestresultiscontinuouslyreturnedastheunderlyingdataarrives.
Thecharacteristicsoftheabovedatastreamdeterminethecharacteristicsofdatastreamprocessing:oneaccess,continuousprocessing,limitedstorage,approximateresults,andfastresponse.
Theapproximateresultisaninevitableresultproducedundertheconstraintsofthefirstthreeconditions.Sincethedatacanonlybeaccessedonce,andthereisonlyarelativelysmalllimitedspacetostorethedata,itisusuallyimpossibletogenerateaccuratecalculationresults.Afterchangingtherequirementsforresultsfrom"precise"to"approximate"inthepast,itbecomespossibletoachieverapidresponsetodatastreamqueries.
Classification
Thenatureandformatofthedataaredifferent,andtheprocessingmethodofthestreamisalsodifferent.Therefore,intheJavainput/outputclasslibrary,therearedifferentstreamclassestocorrespondtodifferentNatureoftheinput/outputstream.Injava.Intheiopackage,thebasicinput/outputstreamcanbedividedintotwotypesaccordingtothetypeofreadandwritedata:bytestreamandcharacterstream.
Inputstreamandoutputstream
Datastreamisdividedintoinputstream(InputStream)andoutputstream(OutputStream).Theinputstreamcanonlybereadbutnotwritten,andtheoutputstreamcanonlybewrittenbutnotread.Usuallytheprogramusestheinputstreamtoreaddataandtheoutputstreamtowritedata,justasdataflowsintoandoutoftheprogram.Theuseofdataflowmakestheinputandoutputoperationsoftheprogramindependentofrelatedequipment.
Theinputstreamcangetdatafromthekeyboardorfile,andtheoutputstreamcantransmitdatatothemonitor,printerorfile.
BufferedStream
Inordertoimprovetheefficiencyofdatatransmission,BufferedStreamisusuallyused,thatis,astreamisequippedwithabuffer(buffer),andabufferisdedicatedThememoryblockusedtotransferdata.Whenwritingdatatoabufferstream,thesystemdoesnotdirectlysendtotheexternaldevice,butsendsthedatatothebuffer.Thebufferautomaticallyrecordsdata.Whenthebufferisfull,thesystemsendsallthedatatothecorrespondingdevice.
Whenreadingdatafromabufferstream,thesystemactuallyreadsthedatafromthebuffer.Whenthebufferisempty,thesystemwillautomaticallyreaddatafromtherelevantdeviceandreadasmuchdataaspossibletofillthebuffer.
Modeldescription
Wetrytosummarizeanddescribethedataflowmodelfromthreedifferentaspects:datacollection,dataattributes,andcalculationtypes.Infact,manyarticleshaveproposedavarietyofdataflowmodels.Wedidnotincludeallthesemodels,butsummarizedandclassifiedthemoreimportantandcommonones.
De ratione
Thefollowingisaformaldescriptionofthedataflow.
Considervectorα,itsattributedomainis[1..n](rankisn), and thestateofvectorαattimet
α(t)=
Interdum, αisazerovector, id est, αi(s)=0foralli. Theupdateofeachcomponent of thevectorisin the formofastream of two-tuples. Thatis, thetthupdateis(i,ct),quod est αi(t)=αi(t.1)+ct, and fori.=.i, αi. )=αi.(t.1).
Notitia collectio
Wefirstconsiderwhatdataisincludedinthecalculationrangewhenperformingdataflowcalculations.Regardingthisissue,therearemainlythreedifferentmodels:datastreammodel,slidingwindowmodelandn-of-Nmodel.
Datastreammodel(inthedatastreammodel)Inthedatastreammodel, alldata ex certo tempore, debet includi calculationrange. Hoc tempore, s=0, id est, attime0, αisa0vector. Thatis, haec est origina- rum maxime commonmodelofdataflow.
Slidingwindowmodel(computingthemostrecentNdata)Theslidingwindowmodelmeansthat,countingfromthetimeofcalculation,theforwardNdatamustbeincludedinthecalculationrange.Atthistime,s=t.N,thatis,attimet.N,αisazerovector.Inotherwords,tocalculatethemostrecentNdata.Sincethedataofthedatastreamisconstantlyemerging,sointuitively,thismodeislikeusingaconstantwindow,thedatapassesthroughthewindowwiththepassageoftime,andthedatainthewindowisthecalculateddataset.M.Dataretal.[91]firstproposedthismodel,andthenreceivedawiderangeofresponses[92].
n-of-Nmodel (calculatethemostrecentndata, amongwhich0
dataattributes
Characteresofthedataitself:
Tempora (timeseriesmodel)Thedatacomes in theorderofitsattributes(actually time).In hoc casu i=t, id est, anupdateattimetis(t,ct). 1).Qualis datae statis, etc.
Cashregistermodel(cashregistermodel)Thedataofthesameattributeisadded,andthedataispositive.Inthismodel,ct>=0.ThismeansForalliandt,αi(t)isalwaysnotlessthanzeroandisincreasing.Infact,thismodelisconsideredtobethemostcommonlyused,forexample,itcanbeusedforcashregister(cashregister)Themodelgetsitsname),thenetworktransmissionvolumeofeachIP,themonitoringofthecalldurationofmobilephoneusers,andsoon.
Theturnstilemodel(turnstilemodel)Thedataofthesameattributeisadded,andthedataispositiveorNegative.Inthismodel,ctcanbegreaterthan0orlessthan0.Thisisthemostcommonmodel.S.Muthukrishnan[89]calleditaturnstilemodelbecausethefunctionofthismodelislikethecrossofasubwaystation.Turnstilescanbeusedtocalculatehowmanypeoplehavearrivedandleft,andthusthenumberofpeopleinthesubway.
Calculationtypes
Thecalculationofdatastreamdatacanbedividedintotwocategories:Basiccalculationsandcomplexcalculations.Basiccalculationsmainlyincludepointquery,rangequeryandinnerproductquery.Complexcalculationsincludequantilecalculation,frequentitemcalculation,anddatamining.
Pointqueryreturns the value of α(t).
RangequeryForrangequeryQ(f,t), return
t
.α(t)
i = f *
InnerproductForvectorβ,theinnerproductofαandβ
α.β=Σni=1α(t)βi
Quantile(Quantile)Givenasequencenumberr,returnthevaluev,andensurethattherealrankrofvinαmeetsthefollowingrequirements:
r.εN≤r.≤r+εN
Inter quos, εistheacura, N=Σni=1αi(t).
GSMankuetc.[94]providesaframeworkstructureforapproximateestimationofquantilesthroughascan,andtreatsthedatasetasthenodesofthetree.Thesenodeshavedifferentweights(suchasthenumberofdatacontainedinthenode).Itisbelievedthatallquantileestimationalgorithmscanbeconsideredtobecomposedofthreeoperationsonnodestogeneratenewnodes(NEW),merge(COLLAPSE)andoutput(OUTPUT).Differentstrategiesconstitutedifferenttypesoftrees.Thisframeworkstructurebecamethebasisofmanysubsequentquantileestimationalgorithms.
FrequentitemsaresometimescalledHeavyhitters,whichmeansfindingitemsthatfrequentlyappearinthedatastream.Inthiscalculation,actuallyletct=1.Inthisway,αi(t)storesthearrivalfrequencyofdatawhosedimensionvalueisequaltoiasoftimet.Thequeryofthesedatacanbedividedintotwotypes:
Findthefirstkmostfrequentlyoccurringitems
Findallitemswithafrequencygreaterthan1/k
>Theresearchonthefrequencytermmainlyfocusesonthelattercalculation[95].
Miningo-data strema incomplexa calculi involvunt. Investigatio in re comprehendit: multidimensionalis analysis [96], classificationis analysis [97,98], racemi analysis [99-102], alia passalgorithms[103].
Relatedideas
Introductio
Themaindifficultyindatastreamprocessingishowtocontrolthespacespentstoringdatawithinacertainrange.Althoughthequestionofqueryresponsetimeisalsoimportant,itisrelativelyeasytosolve.Asahotspotintheresearchfield,datastreamprocessinghasbeenextensivelystudied,andmanyalgorithmshaveemerged.
Oneideatosolvethecontradictionbetweenthehugeamountofdatainthedatastreamandthelimitedstoragespaceistousesampling.AnotherideaistoconstructasmalldatastructurethatcanprovideapproximateresultstostorecompressedDatastreamdata,thisstructurecanbestoredinmemory.Sketch,histogram,andwaveletareactuallythemostimportantthreeofsuchdatastructures.
Infact,mostoftheabovemethodshavebeenusedinthefieldoftraditionaldatabases.Theproblemishowtoapplythemtothespecialenvironmentofdataflow.
Randomsampling
Randomsamplingcancapturethebasiccharacteristicsofadatasetbydrawingasmallnumberofsamples.Averycommonandsimplemethodisuniformsampling.Asanalternativesamplingmethod,strati.edsamplingcanreduceerrorscausedbyunevendatadistribution.However,forcomplexanalysis,ordinarysamplingalgorithmsstillrequiretoomuchspace.
Forsomespecialcalculationsofdatastreams,someinterestingsamplingalgorithmshaveappeared.Stickysampling[95]isusedforthecalculationoffrequentitems.ThemethodofstickysamplingistostorethesetSformedbythetwo-tuple(i,f)inthememory.Foreachpieceofdatathatcomes,ifthekeyialreadyexistsinS,thecorrespondingfisincreasedby1;otherwise,Samplingisperformedwithaprobabilityof1r.Ifthisitemisselected,agroup(i,1)isaddedtoS;afteraperiodoftime,thegroupinSisscannedonceandthevalueisupdated.Thenincreasethevalueofr;attheend(ortheuserrequeststheresult) ,outputallgroupsoff(s-e)N.
Thedistinctsampling[104]proposedbyP.Gibbonsisusedfordistinctcounting,thatis,tofindthenumberofdifferentvaluesinthedatastream.Itusesahashfunctiontomapeachdifferentvaluethatarrivestoleveliwithaprobabilityof2.(i+1);ifi≥memorylevelL(theinitialvalueofLis0),addittomemory,Otherwisediscard;whenthememoryisfull,deletethevalueoflevelLinthememory,andadd1toL;thefinalestimateofthedistinctcountisthedifferentvalueinthememorymultipliedby2L.Distinctcountingisanoldproblemindatabaseprocessing.Theadvantageofthisalgorithmisthatbysettingappropriateparameters,itcanbeappliedtoquerieswithpredicates(thatis,distinctcountingisperformedonasubsetofthedatastream).
Thedisadvantageofsamplingalgorithmsisthattheyarenotsensitiveenoughtoabnormaldata.Moreover,eveniftheycanbewellappliedtocommondataflowmodels,theyneedtobemodifiediftheyaretobeusedinslidingwindowmodels[91]orn-of-Nmodels[93].
Sketchingofstructure
Sketchingreferstotheuseofrandomprojectionstoprojectthedatastreamintoasmallstoragespaceasasummaryoftheentiredatastream.Thesummarydatastoredinspaceiscalledathumbnail,whichcanbeusedtoapproximateanswerstospecificqueries.DifferentsketchescanbeusedtoestimatedifferentLpnormsofthedatastream,andtheseLpnormscanbeusedtoanswerothertypesofqueries.Forexample,theL0normcanbeusedtoestimatedistinctcountsofdatastreams;theL1normcanbeusedtocalculatequantilesandfrequentitems;theL2normcanbeusedtoestimatethelengthofself-connections,andsoon.
TheconceptofsketcheswasfirstproposedbyN.Alonin[105].Sincethen,varioussketchesandtheirconstructionalgorithmshavecontinuouslyemerged.
TherandomizedstechingproposedbyN.Alonin[105]canbeusedfortheestimationofdifferentLpnorms,andrequiresatmostO(n1.lgn)space.ThemoreimportantcontributionofthispaperisthatitcanalsoestimateL2withaspacerequirementofO(logn+logt).ItsmainideaistouseahashfunctiontoconsistentlyandrandomlymapeachelementinthedomainDofthedataattributetozi∈{.1+1},sothattherandomvariableX=.iαizi,X2canbeusedasEstimateofL2norm.
p1
ThequantilesketchproposedbyS.Guhaetal.[88]maintainsasetofdatastructureslike(vi,gi,Δi),rmax(vi)andrmin(vi)arethemaximumandminimumpossiblerankingsofvi,respectively.Fori>j:
vi>vj
gi=rmin(vi).Rmin(vi.1)
i=rmax(vi).rmin(vi)
Withthearrivalofthedata,updatetheoutlineaccordinglytokeeptheestimationwithinacertainaccuracy.X.Linetal.[93]gaveamoreformaldescriptionofthisproblem.
IfASisarandomsetextractedfrom[1..n],theprobabilityofeachelementbeingextractedis1/2.A.Gilbertetal.[106]constructseveralASs,andcallthesumofelementvaluesineachsetarandomsum.Multiplerandomsumsmakeupasketch.Theestimationofαiis
2E(||AS|||αi∈AS).||A||,where||A||isthesumofallthenumbersinthedatastream.Therefore,thiskindofthumbnailcanbeusedtoestimatetheresultofapointquery.Usingmultiplesuchthumbnailscanbeusedforestimationrangequery,quantilequery,etc.Thesketchingtechniqueisactuallytheresultofatrade-offbetweenspaceandaccuracy.InordertoensurethattheerrorofthepointqueryresultislessthanεN,thespacerequiredfortheabovesketchisusuallyε.2asthecoefficient.Incomparisonwiththis,theCount-MinSketchproposedbyG.Cormodeetal.[19]onlyneedsspacefortheε.1coefficient.Theideaisalsorelativelysimple.Useseveralhashfunctionstoprojectseparatedatastreamsontomultiplesmallthumbnails.Whenansweringapointquery,eachthumbnailisansweredseparately,andthesmallestvalueisselectedastheanswer.Basedonpointquery,count-minimumoutlinecanbeusedforvariousotherqueriesandcomplexcalculations.Thecount-minimalsketchdoesnotcalculatetheLpnorm,butdirectlycalculatestheresultofthepointquery,whichisoneofthereasonswhyitsspace-timeefficiencyishigherthanothersketches.
Histogram
Thehistogram(histogram)hastwomeanings:oneisahistogramintheordinarysense,whichisavisualmeansfordisplayingapproximatestatistics;inaddition,itItisalsoadatastructure/methodthatcapturestheapproximatedistributionofdata.Whenappearingasthelatter,thehistogramisconstructedlikethis:thedataisdividedintomultipledisjointsubsets(calledbuckets)accordingtoitsattributes,andthevaluesinthebucketsareapproximatedinaunifiedway[107].
Thehistogrammethodismainlyusedforsignalprocessing,statistics,imageprocessing,computervisionanddatabase.Inthedatabasefield,thehistogramwasoriginallymainlyusedforselectivityestimation,forselectionqueryoptimizationandapproximatequeryprocessing.Histogramisoneofthesimplestandmostflexibleapproximateprocessingmethods,anditisalsothemosteffectiveone.Aslongasthedataupdateproblemissolved,theoriginalhistogramcanbeusedindatastreamprocessing.Thistypeofhistogramthatisautomaticallyadjustedaccordingtothenewdataiscalledadynamic(oradaptive/self-adjusting)histogram.
ThehistogramproposedbyL.Fuetal.[108]ismainlyusedforthecalculationofthemedianfunction(Median)andotherquantilefunctions.Itcanbeusedforapproximatecalculationsandaccuratequeries.ItusesDeterministicBucketingandRandomizedBucketingtechnologiestoconstructmultiplebucketswithdifferentprecisions,andthendividetheinputdataintothesebucketsstepbystep,thuscompletingthedynamichistogramstructure.
Becauseitisdifficulttodirectlyapplystatichistogramstodatastreamprocessing.S.Guhaetal.[88]candynamicallyconstructnear-optimalV-optimalhistograms,buttheycanonlybeappliedtodatastreamsundertimeseriesmodels.
Acommonlyusedmethodistodividetheentirealgorithmintotwosteps:firstconstructasketchofthedataflowdata;thenconstructasuitablehistogramfromthissketch.Thismethodcantakeadvantageoftheeasyupdateofthethumbnaildataandrealizethedynamicsofthehistogram.N.Thaperetal.[109]firstconstructedasketchthatapproximatelyreflectsthedatastreamdata,andusedtheexcellentupdateperformanceofthesketchtoupdatethedata,andthenderivedahistogramfromthissketchtoapproximatethedatastreamdata.SincederivingthebesthistogramfromthesketchisanNP-hardproblem,theauthorprovidesaheuristicalgorithm(greedyalgorithm)tosearchforabetterhistogram.
A.Gilbertetal.[110]constructedasummarydatastructurethatusesasetofrandomandstructuresimilartothoseintheliterature[106]tostorethevaluesofdyadicintervalatdifferentgranularitylevels.Subsequently,thedyadicinterval([111])ofdifferentgranularitylevelsisaddedtothehistogramtobeconstructedfromlargetosmall,soastominimizetheapproximateerror(refinement).
A.Gilbertetal.[112]mainlyconsideredhowtoreducetheprocessingcomplexityofeachinputdatainthedatastream.Theyfirstconvertedtheinputdataintowaveletcoefficients(usingthewaveletcoefficientsastheinnerproductofthesignalandthebasisvector),andthenadoptedadyadicintervalprocessingmethodsimilartotheliterature[110].Thesketchiscloselyrelatedtothehistogram.Fromacertainperspective,thehistogramcanberegardedasaspecialcaseofthesketch.
WaveletTransformation
Wavelettransformation(wavelettransformation)isoftenusedtogeneratesummaryinformationofdata.Thisisbecauseusuallyonlyasmallpartofthewaveletcoefficientsisimportant,andmostofthecoefficientsareeitherverysmallorunimportant.Therefore,ifyouignoretheunimportantcoefficientsgeneratedbythedataafterthewavelettransform,youcanuseverylittlespacetocompletetheapproximationoftheoriginaldata.
Y.Matiasetal.firstconstructedahistogramforthedatastreamdataandsimulateditwithwavelet.Subsequently,someofthemostimportantwaveletcoefficientsareretainedtosimulatethehistogram.Whennewdataappears,thehistogramisupdatedbyupdatingthesewaveletcoefficients.
Whattheliteratureproposesisactuallyahistogrammethod,butituseswavelettransform.A.Gilbertetal.pointedoutthatthewavelettransformcanbeconsideredastheinnerproductofasignalandasetoforthogonalvectorsoflengthN.Therefore,asetofdatastreamdataoutlinesareconstructed.Becausetheoutlinescancalculatethesignalandasetofdataeasilyandaccurately.Theinnerproductofthegroupvectorcanthenbeusedtocalculatethewaveletcoefficientsfromthesketch,whichcanbeusedforpointqueryandrangequeryestimation.
NewTrends
Researchershavecontinuedtodeepentheirresearchondatastreamprocessing.Webelievethatthefollowingnewtrendshaveemerged:
Futuresketches
b>Introducemorestatistics
Calculationtechniquestoconstructsketches
G.Cormodeandothersmainlydealwiththecalculationoffrequentitems.Itisbasedonthepreviousmajoritemalgorithm([116,117])anduseserror-correctingcodestodealwithproblems.Forexample,acounterissetupforeachbitofthedata,andthenthefrequentitemsetisinferredbasedonthecountingresultsofthesecounters.
Y.Taoetal.[118]isessentiallyanapplicationofProbabilisticcounting(distinctcountingthathasbeenwidelyusedinthedatabasefield)indatastreamprocessing.
Expandingthesketchmap
Extendthesketchmaptodeal with morecomplexqueries.
Linetal.intheliterature[93]constructedacomplexsketchsystemthatcanbeusedtoestimatethequantileoftheslidingwindowmodelandthen-of-Nmodel,whichisdifficulttoachievewithsimplesketches.
Undertheslidingwindowmodel,literature[93]dividesthedataintomultiplebucketsinchronologicalorder,establishesthumbnailsineachbucket(theaccuracyishigherthanrequired),andthencombinesthesethumbnailsduringqueryMerge,wherethelastbucketmayneedtobelifted.Duringmaintenance,onlyexpiredbucketsaredeletedandnewbucketsareadded.
Inthen-of-Nmodel,literature[93]dividesthedataintomultiplebucketsofdifferentsizesaccordingtotheEHPartitioningtechnique,andbuildsasketchineachbucket(theaccuracyishigherthanrequired),Thenmergesomeofthethumbnailsduringthequerytoensuretherequiredaccuracy,andthelastonemayneedtobeimproved.
Combinespatiotemporaldata
Furthercombinationwithspatiotemporaldataprocessing:
J.Sunetal.[120]Mainlyforhistoricalqueryandpredictionprocessingofspatio-temporaldata.However,thearticleemphasizesthatspatio-temporaldataappearsintheformofdatastreams,andtheprocessingalsofocusesmoreontheupdateperformanceofspatio-temporaldata.
Y.Taoetal.[118]usethedatastreammethodtoprocessspatio-temporaldata.Byconstructingasketchofthedynamicspatio-temporaldata,itisusedtodistinguishwhethertheobjectismovingorstationaryamongmultipleregions,andestimateItsnumber.Butthiskindofproblemisdifficulttosolveintheoriginaltimeandspaceprocessing.
Novella
Thedatastreamofonlinenovelsisanemerginggenre,whichmeansthattheprotagonist'sstrengthisdigitized,andthedatadisplayedisthesameastheattributebarofonlinegames.