Luo tausta
Thedevelopmentofdataflowapplicationsistheresultofthefollowingtwofactors:
Yksityiskohtaiset tiedot
IthasbeenabletocontinuetoautomaticallyGeneratealotofdetaileddata.Thistypeofdatafirstappearedinthetraditionalbankingandstocktradingfields,andlateralsoappearedingeologicalsurveys,meteorology,astronomicalobservations,etc.Inparticular,theemergenceoftheInternet(networktrafficmonitoring,clickstream)andwirelesscommunicationnetworks(callrecords)hasproducedalargeamountofdatastreamtypedata.Wehavenoticedthatmostofthiskindofdataisrelatedtogeographicinformation.Thisismainlyduetothelargedimensionsofgeographicinformationanditiseasytogeneratesuchalargeamountofdetaileddata.
Kompleksianalyysi
Itisnecessarytoperformcomplexanalysisontheupdatestreaminanearreal-timemanner.Kompleksianalyysi(suchastrendanalysis,forecasting)ofthedataintheabovefieldsisoftendoneoffline(inthedatawarehouse),butsomenewapplications(especiallyinthefieldofnetworksecurityandnationalsecurity)areverytime-sensitive,Suchasthedetectionofextremeevents,fraud,intrusion,anomaliesontheInternet,complexcrowdmonitoring,tracktrend,exploratoryanalyses,harmonicanalysis,etc.,allrequireonlineanalysis.
Afterthis,theacademiccommunitybasicallyrecognizedthisdefinition,andsomearticlesalsoslightlymodifiedthedefinitiononthisbasis.Forexample,S.Guhaetal.[88]believethatadatastreamisan"orderedsequenceofpointsthatcanonlybereadonceorafewtimes",andhererelaxesthe"onepass"restrictioninthepreviousdefinition.
Whydoyouemphasizethelimitationonthenumberofdatareadsintheprocessingofdatastreams?S.Muthukrishnan[89]pointedoutthatdatastreamrefersto"inputdataarrivingataveryhighspeed",sothetransmission,calculationandstorageofdatastreamdatawillbecomeverydifficult.Inthiscase,thereisonlyachancetoprocessthedataoncewhenitfirstarrives,anditisdifficulttoaccessthedataatothertimes(becausethereisnosuchdataanditisimpossibletosaveit).
Erottuvat ominaisuudet
Erotperinteiseen suhteelliseen tietomalliin
B.Babcocketal.[90]believethatthedataflowmodelisasfollowsSeveralaspectsaredifferentfromthetraditionalrelationaldatamodel:
1.Dataarvesonline;
2.Theprocessingsystemcannotcontrolthearrivalorderoftheprocesseddata;
3.Tietoja on rajoittamaton;
4.Duetothehugeamountofdata,theelementsinthedatastreamwillbediscardedorarchivedafterbeingprocessed.Itwillbedifficulttoobtainthesedatainthefutureunlessthedataisstoredinmemory,butsincethesizeofthememoryisusuallymuchsmallerthantheamountofdatainthedatastream,thedataisusuallyonlyobtainedwhenthedataarrivesforthefirsttime.
Kolme ominaisuutta
Webelievethatthecurrentresearchondataflowcalculationisdifferentfromthetraditionalcalculationmodel,thekeyliesinthedataflowdataitselfIthasthefollowingthreecharacteristics:
Tietojen saapuminen – nopea
Thismeansthattheremaybealargeamountofinputdatatobeprocessedinashorttime.Thisisabigburdenontheprocessorandinputandoutputdevices,sotheprocessingofthedatastreamshouldbeassimpleaspossible.
Tietojen valikoima – laaja alue
Thismeansthatthevaluerangeofthedataattribute(dimension)isverylarge,andtherearemanypossiblevalues,suchasRegion,mobilephonenumber,person,networknode,etc.Thisisthemainreasonwhythedatastreamcannotbestoredinthememoryorharddisk.Ifthedimensionissmall,eveniftheamountofincomingdataislarge,thedatacanbestoredinasmallermemory.Forexample,forawirelesscommunicationnetwork,ifthereareonly1,000usersforthesame1millioncallrecords,then1,000storageunitscansaveenoughandaccurateenoughdatatoanswer"ThecumulativecalltimeofacertainuserisHowlongistheproblem?Ifthereare100,000usersintotal,100,000storageunitsareneededtostorethisinformation.Theattributesofdatastreamdataaremostlyrelatedtogeographicinformation,IPaddresses,mobilephonenumbers,etc.,andareoftenassociatedwithtime.Atthistime,thedimensionalityofthedatafarexceedsthecapacityofthememoryandharddisk,whichmeansthatthesystemcannotcompletelystorethisinformation,andusuallycanonlyaccessthedataoncewhenthedataarrives.
Tietojen saapumisaika – jatko
Thecontinuousarrivalofdatameansthattheamountofdatamaybeunlimited.Moreover,theresultofprocessingthedatawillnotbethefinalresult,becausethedatawillcontinuetoarrive.Therefore,theresultofthequeryonthedatastreamisoftennotone-timebutcontinuous,thatis,thelatestresultiscontinuouslyreturnedastheunderlyingdataarrives.
Thecharacteristicsoftheabovedatastreamdeterminethecharacteristicsofdatastreamprocessing:oneaccess,continuousprocessing,limitedstorage,approximateresults,andfastresponse.
Theapproximateresultisaninevitableresultproducedundertheconstraintsofthefirstthreeconditions.Sincethedatacanonlybeaccessedonce,andthereisonlyarelativelysmalllimitedspacetostorethedata,itisusuallyimpossibletogenerateaccuratecalculationresults.Afterchangingtherequirementsforresultsfrom"precise"to"approximate"inthepast,itbecomespossibletoachieverapidresponsetodatastreamqueries.
Luokittelu
Thenatureandformatofthedataaredifferent,andtheprocessingmethodofthestreamisalsodifferent.Therefore,intheJavainput/outputclasslibrary,therearedifferentstreamclassestocorrespondtodifferentNatureoftheinput/outputstream.Injava.Intheiopackage,thebasicinput/outputstreamcanbedividedintotwotypesaccordingtothetypeofreadandwritedata:bytestreamandcharacterstream.
Tulovirta ja lähtövirta
Datastreamisdividedintoinputstream(InputStream)andoutputstream(OutputStream).Theinputstreamcanonlybereadbutnotwritten,andtheoutputstreamcanonlybewrittenbutnotread.Usuallytheprogramusestheinputstreamtoreaddataandtheoutputstreamtowritedata,justasdataflowsintoandoutoftheprogram.Theuseofdataflowmakestheinputandoutputoperationsoftheprogramindependentofrelatedequipment.
Theinputstreamcangetdatafromthekeyboardorfile,andtheoutputstreamcantransmitdatatothemonitor,printerorfile.
Puskuroitu Stream
Inordertoimprovetheefficiencyofdatatransmission,Puskuroitu Streamisusuallyused,thatis,astreamisequippedwithabuffer(buffer),andabufferisdedicatedThememoryblockusedtotransferdata.Whenwritingdatatoabufferstream,thesystemdoesnotdirectlysendtotheexternaldevice,butsendsthedatatothebuffer.Thebufferautomaticallyrecordsdata.Whenthebufferisfull,thesystemsendsallthedatatothecorrespondingdevice.
Whenreadingdatafromabufferstream,thesystemactuallyreadsthedatafromthebuffer.Whenthebufferisempty,thesystemwillautomaticallyreaddatafromtherelevantdeviceandreadasmuchdataaspossibletofillthebuffer.
Mallin kuvaus
Wetrytosummarizeanddescribethedataflowmodelfromthreedifferentaspects:datacollection,dataattribuutteja,andcalculationtypes.Infact,manyarticleshaveproposedavarietyofdataflowmodels.Wedidnotincludeallthesemodels,butsummarizedandclassifiedthemoreimportantandcommonones.
Formalisointi
Seuraava onmuotoinen tietovirran kuvaus.
Harkitse vektoria α, sen attributedomainis[1..n](rankisn),andthestateofvektoriαattime
α(t)=
Ajoittain,αisazerovector,this,αi(s)=0foralli.Vektorin jokaisen osan päivitys on kaksituplavirran muodossa. Tämä tarkoittaa, että αi(t)=αi(t.1)+ct.i,for αi=ct.i )=αi.(t.1).Kysymys, jokatapahtuu α(t):lle.
Tiedonkeruu
Wefirstconsiderwhatdataisincludedinthecalculationrangewhenperformingdataflowcalculations.Regardingthisissue,therearemainlythreedifferentmodels:datastreammodel,slidingwindowmodelandn-of-Nmodel.
Tietovirtamalli(datavirtamalli)Tietovirtamallissa kaikki tiedot tietyltä ajalta on sisällytettävä laskenta-alueeseen.Tällä hetkellä s=0,eli,attime0,αisa0vektori.Tämä onalkuperä ja yleisin tietovirran malli.
Slidingwindowmodel(computingthemostrecentNdata)Theslidingwindowmodelmeansthat,countingfromthetimeofcalculation,theforwardNdatamustbeincludedinthecalculationrange.Atthistime,s=t.N,thatis,attimet.N,αisazerovector.Inotherwords,tocalculatethemostrecentNdata.Sincethedataofthedatastreamisconstantlyemerging,sointuitively,thismodeislikeusingaconstantwindow,thedatapassesthroughthewindowwiththepassageoftime,andthedatainthewindowisthecalculateddataset.M.Dataretal.[91]firstproposedthismodel,andthenreceivedawiderangeofresponses[92].
n-of-Nmodel(laskeviimeisimmät tiedot,joista0
dataattribuutteja
Itsensä ominaisuudet:
Aikasarja(aikasarjamalli)Tiedot tulevat määritteidensa järjestyksessä (todellinen aika). Tässä tapauksessa,i=t,thatis,anupdateattimetis(t,ct).Tällä hetkellä,αPäivitystoimintoonαt(t)=ct,andfori.=.t,αi.(t). 1).Tämä malli sopii aikasarjatiedoille, kuten tietyn IP-osoitteen lähteville tiedoille tai osakkeiden jaksoittain päivitetyille tiedoille jne.
Cashregistermodel(cashregistermodel)Thedataofthesameattributeisadded,andthedataispositive.Inthismodel,ct>=0.ThismeansForalliandt,αi(t)isalwaysnotlessthanzeroandisincreasing.Infact,thismodelisconsideredtobethemostcommonlyused,forexample,itcanbeusedforcashregister(cashregister)Themodelgetsitsname),thenetworktransmissionvolumeofeachIP,themonitoringofthecalldurationofmobilephoneusers,andsoon.
Theturnstilemodel(turnstilemodel)Thedataofthesameattributeisadded,andthedataispositiveorNegative.Inthismodel,ctcanbegreaterthan0orlessthan0.Thisisthemostcommonmodel.S.Muthukrishnan[89]calleditaturnstilemodelbecausethefunctionofthismodelislikethecrossofasubwaystation.Turnstilescanbeusedtocalculatehowmanypeoplehavearrivedandleft,andthusthenumberofpeopleinthesubway.
Laskentatyypit
Thecalculationofdatastreamdatacanbedividedintotwocategories:Basiccalculationsandcomplexcalculations.Basiccalculationsmainlyincludepointquery,rangequeryandinnerproductquery.Complexcalculationsincludequantilecalculation,frequentitemcalculation,anddatamining.
Pistekysely palauttaa arvonαi(t).
RangequeryForrangequeryQ(f,t),return
t
.αi(t)
i=f
InnerproductForvectorβ,innerproductofαandβ
α.β=Σni=1αi(t)βi
Quantile(Quantile)Givenasequencenumberr,returnthevaluev,andensurethattherealrankrofvinαmeetsthefollowingrequirements:
r.εN≤r.≤r+εN
Niistä εontarkkuus,N=Σni=1αi(t).
GSMankuetc.[94]providesaframeworkstructureforapproximateestimationofquantilesthroughascan,andtreatsthedatasetasthenodesofthetree.Thesenodeshavedifferentweights(suchasthenumberofdatacontainedinthenode).Itisbelievedthatallquantileestimationalgorithmscanbeconsideredtobecomposedofthreeoperationsonnodestogeneratenewnodes(NEW),merge(COLLAPSE)andoutput(OUTPUT).Differentstrategiesconstitutedifferenttypesoftrees.Thisframeworkstructurebecamethebasisofmanysubsequentquantileestimationalgorithms.
FrequentitemsaresometimescalledHeavyhitters,whichmeansfindingitemsthatfrequentlyappearinthedatastream.Inthiscalculation,actuallyletct=1.Inthisway,αi(t)storesthearrivalfrequencyofdatawhosedimensionvalueisequaltoiasoftimet.Thequeryofthesedatacanbedividedintotwotypes:
Etsiensimmäisetk useimmin esiintyvät kohteet
Etsi kaikki kohteet, joiden taajuus on suurempi kuin 1/k
>Theresearchonthefrequencytermmainlyfocusesonthelattercalculation[95].
Tietovirran louhinta sisältää monimutkaisempia laskelmia. Tämän alueen tutkimus sisältää: moniulotteisen analyysin[96], luokitusanalyysin[97,98], klusterianalyysin[99–102] ja muita passalgoritmeja[103].
Aiheeseen liittyviä ideoita
Johdanto
Themaindifficultyindatastreamprocessingishowtocontrolthespacespentstoringdatawithinacertainrange.Althoughthequestionofqueryresponsetimeisalsoimportant,itisrelativelyeasytosolve.Asahotspotintheresearchfield,datastreamprocessinghasbeenextensivelystudied,andmanyalgorithmshaveemerged.
Oneideatosolvethecontradictionbetweenthehugeamountofdatainthedatastreamandthelimitedstoragespaceistousesampling.AnotherideaistoconstructasmalldatastructurethatcanprovideapproximateresultstostorecompressedDatastreamdata,thisstructurecanbestoredinmemory.Sketch,histogram,andwaveletareactuallythemostimportantthreeofsuchdatastructures.
Infact,mostoftheabovemethodshavebeenusedinthefieldoftraditionaldatabases.Theproblemishowtoapplythemtothespecialenvironmentofdataflow.
Satunnaisotos
Satunnaisotoscancapturethebasiccharacteristicsofadatasetbydrawingasmallnumberofsamples.Averycommonandsimplemethodisuniformsampling.Asanalternativesamplingmethod,strati.edsamplingcanreduceerrorscausedbyunevendatadistribution.However,forcomplexanalysis,ordinarysamplingalgorithmsstillrequiretoomuchspace.
Forsomespecialcalculationsofdatastreams,someinterestingsamplingalgorithmshaveappeared.Stickysampling[95]isusedforthecalculationoffrequentitems.ThemethodofstickysamplingistostorethesetSformedbythetwo-tuple(i,f)inthememory.Foreachpieceofdatathatcomes,ifthekeyialreadyexistsinS,thecorrespondingfisincreasedby1;otherwise,Samplingisperformedwithaprobabilityof1r.Ifthisitemisselected,agroup(i,1)isaddedtoS;afteraperiodoftime,thegroupinSisscannedonceandthevalueisupdated.Thenincreasethevalueofr;attheend(ortheuserrequeststheresult) ,outputallgroupsoff.(s-e)N.
Thedistinctsampling[104]proposedbyP.Gibbonsisusedfordistinctcounting,thatis,tofindthenumberofdifferentvaluesinthedatastream.Itusesahashfunctiontomapeachdifferentvaluethatarrivestoleveliwithaprobabilityof2.(i+1);ifi≥memorylevelL(theinitialvalueofLis0),addittomemory,Otherwisediscard;whenthememoryisfull,deletethevalueoflevelLinthememory,andadd1toL;thefinalestimateofthedistinctcountisthedifferentvalueinthememorymultipliedby2L.Distinctcountingisanoldproblemindatabaseprocessing.Theadvantageofthisalgorithmisthatbysettingappropriateparameters,itcanbeappliedtoquerieswithpredicates(thatis,distinctcountingisperformedonasubsetofthedatastream).
Thedisadvantageofsamplingalgorithmsisthattheyarenotsensitiveenoughtoabnormaldata.Moreover,eveniftheycanbewellappliedtocommondataflowmodels,theyneedtobemodifiediftheyaretobeusedinslidingwindowmodels[91]orn-of-Nmodels[93].
Rakenteen luonnos
Sketchingreferstotheuseofrandomprojectionstoprojectthedatastreamintoasmallstoragespaceasasummaryoftheentiredatastream.Thesummarydatastoredinspaceiscalledathumbnail,whichcanbeusedtoapproximateanswerstospecificqueries.DifferentsketchescanbeusedtoestimatedifferentLpnormsofthedatastream,andtheseLpnormscanbeusedtoanswerothertypesofqueries.Forexample,theL0normcanbeusedtoestimatedistinctcountsofdatastreams;theL1normcanbeusedtocalculatequantilesandfrequentitems;theL2normcanbeusedtoestimatethelengthofself-connections,andsoon.
TheconceptofsketcheswasfirstproposedbyN.Alonin[105].Sincethen,varioussketchesandtheirconstructionalgorithmshavecontinuouslyemerged.
TherandomizedstechingproposedbyN.Alonin[105]canbeusedfortheestimationofdifferentLpnorms,andrequiresatmostO(n1.lgn)space.ThemoreimportantcontributionofthispaperisthatitcanalsoestimateL2withaspacerequirementofO(logn+logt).ItsmainideaistouseahashfunctiontoconsistentlyandrandomlymapeachelementinthedomainDofthedataattributetozi∈{.1+1},sothattherandomvariableX=.iαizi,X2canbeusedasEstimateofL2norm.
p1
ThequantilesketchproposedbyS.Guhaetal.[88]maintainsasetofdatastructureslike(vi,gi,Δi),rmax(vi)andrmin(vi)arethemaximumandminimumpossiblerankingsofvi,respectively.Fori>j:
vi>vj
gi=rmin(vi).Rmin(vi.1)
Δi=rmax(vi).rmin(vi)
Withthearrivalofthedata,updatetheoutlineaccordinglytokeeptheestimationwithinacertainaccuracy.X.Linetal.[93]gaveamoreformaldescriptionofthisproblem.
IfASisarandomsetextractedfrom[1..n],theprobabilityofeachelementbeingextractedis1/2.A.Gilbertetal.[106]constructseveralASs,andcallthesumofelementvaluesineachsetarandomsum.Multiplerandomsumsmakeupasketch.Theestimationofαiis
2E(||AS|||αi∈AS).||A||,where||A||isthesumofallthenumbersinthedatastream.Therefore,thiskindofthumbnailcanbeusedtoestimatetheresultofapointquery.Usingmultiplesuchthumbnailscanbeusedforestimationrangequery,quantilequery,etc.Thesketchingtechniqueisactuallytheresultofatrade-offbetweenspaceandaccuracy.InordertoensurethattheerrorofthepointqueryresultislessthanεN,thespacerequiredfortheabovesketchisusuallyε.2asthecoefficient.Incomparisonwiththis,theCount-MinSketchproposedbyG.Cormodeetal.[19]onlyneedsspacefortheε.1coefficient.Theideaisalsorelativelysimple.Useseveralhashfunctionstoprojectseparatedatastreamsontomultiplesmallthumbnails.Whenansweringapointquery,eachthumbnailisansweredseparately,andthesmallestvalueisselectedastheanswer.Basedonpointquery,count-minimumoutlinecanbeusedforvariousotherqueriesandcomplexcalculations.Thecount-minimalsketchdoesnotcalculatetheLpnorm,butdirectlycalculatestheresultofthepointquery,whichisoneofthereasonswhyitsspace-timeefficiencyishigherthanothersketches.
Histogrammi
Thehistogram(histogram)hastwomeanings:oneisahistogramintheordinarysense,whichisavisualmeansfordisplayingapproximatestatistics;inaddition,itItisalsoadatastructure/methodthatcapturestheapproximatedistributionofdata.Whenappearingasthelatter,thehistogramisconstructedlikethis:thedataisdividedintomultipledisjointsubsets(calledbuckets)accordingtoitsattributes,andthevaluesinthebucketsareapproximatedinaunifiedway[107].
Thehistogrammethodismainlyusedforsignalprocessing,statistics,imageprocessing,computervisionanddatabase.Inthedatabasefield,thehistogramwasoriginallymainlyusedforselectivityestimation,forselectionqueryoptimizationandapproximatequeryprocessing.Histogrammiisoneofthesimplestandmostflexibleapproximateprocessingmethods,anditisalsothemosteffectiveone.Aslongasthedataupdateproblemissolved,theoriginalhistogramcanbeusedindatastreamprocessing.Thistypeofhistogramthatisautomaticallyadjustedaccordingtothenewdataiscalledadynamic(oradaptive/self-adjusting)histogram.
ThehistogramproposedbyL.Fuetal.[108]ismainlyusedforthecalculationofthemedianfunction(Median)andotherquantilefunctions.Itcanbeusedforapproximatecalculationsandaccuratequeries.ItusesDeterministicBucketingandRandomizedBucketingtechnologiestoconstructmultiplebucketswithdifferentprecisions,andthendividetheinputdataintothesebucketsstepbystep,thuscompletingthedynamichistogramstructure.
Becauseitisdifficulttodirectlyapplystatichistogramstodatastreamprocessing.S.Guhaetal.[88]candynamicallyconstructnear-optimalV-optimalhistograms,buttheycanonlybeappliedtodatastreamsundertimeseriesmodels.
Acommonlyusedmethodistodividetheentirealgorithmintotwosteps:firstconstructasketchofthedataflowdata;thenconstructasuitablehistogramfromthissketch.Thismethodcantakeadvantageoftheeasyupdateofthethumbnaildataandrealizethedynamicsofthehistogram.N.Thaperetal.[109]firstconstructedasketchthatapproximatelyreflectsthedatastreamdata,andusedtheexcellentupdateperformanceofthesketchtoupdatethedata,andthenderivedahistogramfromthissketchtoapproximatethedatastreamdata.SincederivingthebesthistogramfromthesketchisanNP-hardproblem,theauthorprovidesaheuristicalgorithm(greedyalgorithm)tosearchforabetterhistogram.
A.Gilbertetal.[110]constructedasummarydatastructurethatusesasetofrandomandstructuresimilartothoseintheliterature[106]tostorethevaluesofdyadicintervalatdifferentgranularitylevels.Subsequently,thedyadicinterval([111])ofdifferentgranularitylevelsisaddedtothehistogramtobeconstructedfromlargetosmall,soastominimizetheapproximateerror(refinement).
A.Gilbertetal.[112]mainlyconsideredhowtoreducetheprocessingcomplexityofeachinputdatainthedatastream.Theyfirstconvertedtheinputdataintowaveletcoefficients(usingthewaveletcoefficientsastheinnerproductofthesignalandthebasisvector),andthenadoptedadyadicintervalprocessingmethodsimilartotheliterature[110].Thesketchiscloselyrelatedtothehistogram.Fromacertainperspective,thehistogramcanberegardedasaspecialcaseofthesketch.
Wavelet Transformation
Wavelettransformation(wavelettransformation)isoftenusedtogeneratesummaryinformationofdata.Thisisbecauseusuallyonlyasmallpartofthewaveletcoefficientsisimportant,andmostofthecoefficientsareeitherverysmallorunimportant.Therefore,ifyouignoretheunimportantcoefficientsgeneratedbythedataafterthewavelettransform,youcanuseverylittlespacetocompletetheapproximationoftheoriginaldata.
Y.Matiasetal.firstconstructedahistogramforthedatastreamdataandsimulateditwithwavelet.Subsequently,someofthemostimportantwaveletcoefficientsareretainedtosimulatethehistogram.Whennewdataappears,thehistogramisupdatedbyupdatingthesewaveletcoefficients.
Whattheliteratureproposesisactuallyahistogrammethod,butituseswavelettransform.A.Gilbertetal.pointedoutthatthewavelettransformcanbeconsideredastheinnerproductofasignalandasetoforthogonalvectorsoflengthN.Therefore,asetofdatastreamdataoutlinesareconstructed.Becausetheoutlinescancalculatethesignalandasetofdataeasilyandaccurately.Theinnerproductofthegroupvectorcanthenbeusedtocalculatethewaveletcoefficientsfromthesketch,whichcanbeusedforpointqueryandrangequeryestimation.
NewTrends
Researchershavecontinuedtodeepentheirresearchondatastreamprocessing.Webelievethatthefollowingnewtrendshaveemerged:
Tulevaisuuden luonnokset
b>Esittele enemmän tilastotietoja
Laskentatekniikka luonnosten rakentamiseksi
G.Cormodeandothersmainlydealwiththecalculationoffrequentitems.Itisbasedonthepreviousmajoritemalgorithm([116,117])anduseserror-correctingcodestodealwithproblems.Forexample,acounterissetupforeachbitofthedata,andthenthefrequentitemsetisinferredbasedonthecountingresultsofthesecounters.
Y.Taoetal.[118]isessentiallyanapplicationofProbabilisticcounting(distinctcountingthathasbeenwidelyusedinthedatabasefield)indatastreamprocessing.
Luonnoskartan laajentaminen
Laajenna luonnoskartta monimutkaisemmilla kyselyillä.
Linetal.intheliterature[93]constructedacomplexsketchsystemthatcanbeusedtoestimatethequantileoftheslidingwindowmodelandthen-of-Nmodel,whichisdifficulttoachievewithsimplesketches.
Undertheslidingwindowmodel,literature[93]dividesthedataintomultiplebucketsinchronologicalorder,establishesthumbnailsineachbucket(theaccuracyishigherthanrequired),andthencombinesthesethumbnailsduringqueryMerge,wherethelastbucketmayneedtobelifted.Duringmaintenance,onlyexpiredbucketsaredeletedandnewbucketsareadded.
Inthen-of-Nmodel,literature[93]dividesthedataintomultiplebucketsofdifferentsizesaccordingtotheEHPartitioningtechnique,andbuildsasketchineachbucket(theaccuracyishigherthanrequired),Thenmergesomeofthethumbnailsduringthequerytoensuretherequiredaccuracy,andthelastonemayneedtobeimproved.
Yhdistää spatiotemporaldata
Furthercombinationwithspatiotemporaldataprocessing:
J.Sunetal.[120]Mainlyforhistoricalqueryandpredictionprocessingofspatio-temporaldata.However,thearticleemphasizesthatspatio-temporaldataappearsintheformofdatastreams,andtheprocessingalsofocusesmoreontheupdateperformanceofspatio-temporaldata.
Y.Taoetal.[118]usethedatastreammethodtoprocessspatio-temporaldata.Byconstructingasketchofthedynamicspatio-temporaldata,itisusedtodistinguishwhethertheobjectismovingorstationaryamongmultipleregions,andestimateItsnumber.Butthiskindofproblemisdifficulttosolveintheoriginaltimeandspaceprocessing.
Novelgenre
Thedatastreamofonlinenovelsisanemerginggenre,whichmeansthattheprotagonist'sstrengthisdigitized,andthedatadisplayedisthesameastheattributebarofonlinegames.