tietovirta

Luo tausta

Thedevelopmentofdataflowapplicationsistheresultofthefollowingtwofactors:

Yksityiskohtaiset tiedot

IthasbeenabletocontinuetoautomaticallyGeneratealotofdetaileddata.Thistypeofdatafirstappearedinthetraditionalbankingandstocktradingfields,andlateralsoappearedingeologicalsurveys,meteorology,astronomicalobservations,etc.Inparticular,theemergenceoftheInternet(networktrafficmonitoring,clickstream)andwirelesscommunicationnetworks(callrecords)hasproducedalargeamountofdatastreamtypedata.Wehavenoticedthatmostofthiskindofdataisrelatedtogeographicinformation.Thisismainlyduetothelargedimensionsofgeographicinformationanditiseasytogeneratesuchalargeamountofdetaileddata.

Kompleksianalyysi

Itisnecessarytoperformcomplexanalysisontheupdatestreaminanearreal-timemanner.Kompleksianalyysi(suchastrendanalysis,forecasting)ofthedataintheabovefieldsisoftendoneoffline(inthedatawarehouse),butsomenewapplications(especiallyinthefieldofnetworksecurityandnationalsecurity)areverytime-sensitive,Suchasthedetectionofextremeevents,fraud,intrusion,anomaliesontheInternet,complexcrowdmonitoring,tracktrend,exploratoryanalyses,harmonicanalysis,etc.,allrequireonlineanalysis.

Afterthis,theacademiccommunitybasicallyrecognizedthisdefinition,andsomearticlesalsoslightlymodifiedthedefinitiononthisbasis.Forexample,S.Guhaetal.[88]believethatadatastreamisan"orderedsequenceofpointsthatcanonlybereadonceorafewtimes",andhererelaxesthe"onepass"restrictioninthepreviousdefinition.

Whydoyouemphasizethelimitationonthenumberofdatareadsintheprocessingofdatastreams?S.Muthukrishnan[89]pointedoutthatdatastreamrefersto"inputdataarrivingataveryhighspeed",sothetransmission,calculationandstorageofdatastreamdatawillbecomeverydifficult.Inthiscase,thereisonlyachancetoprocessthedataoncewhenitfirstarrives,anditisdifficulttoaccessthedataatothertimes(becausethereisnosuchdataanditisimpossibletosaveit).

Erottuvat ominaisuudet

Erotperinteiseen suhteelliseen tietomalliin

B.Babcocketal.[90]believethatthedataflowmodelisasfollowsSeveralaspectsaredifferentfromthetraditionalrelationaldatamodel:

1.Dataarvesonline;

2.Theprocessingsystemcannotcontrolthearrivalorderoftheprocesseddata;

3.Tietoja on rajoittamaton;

4.Duetothehugeamountofdata,theelementsinthedatastreamwillbediscardedorarchivedafterbeingprocessed.Itwillbedifficulttoobtainthesedatainthefutureunlessthedataisstoredinmemory,butsincethesizeofthememoryisusuallymuchsmallerthantheamountofdatainthedatastream,thedataisusuallyonlyobtainedwhenthedataarrivesforthefirsttime.

Kolme ominaisuutta

Webelievethatthecurrentresearchondataflowcalculationisdifferentfromthetraditionalcalculationmodel,thekeyliesinthedataflowdataitselfIthasthefollowingthreecharacteristics:

Tietojen saapuminen – nopea

Thismeansthattheremaybealargeamountofinputdatatobeprocessedinashorttime.Thisisabigburdenontheprocessorandinputandoutputdevices,sotheprocessingofthedatastreamshouldbeassimpleaspossible.

Tietojen valikoima – laaja alue

Thismeansthatthevaluerangeofthedataattribute(dimension)isverylarge,andtherearemanypossiblevalues,suchasRegion,mobilephonenumber,person,networknode,etc.Thisisthemainreasonwhythedatastreamcannotbestoredinthememoryorharddisk.Ifthedimensionissmall,eveniftheamountofincomingdataislarge,thedatacanbestoredinasmallermemory.Forexample,forawirelesscommunicationnetwork,ifthereareonly1,000usersforthesame1millioncallrecords,then1,000storageunitscansaveenoughandaccurateenoughdatatoanswer"ThecumulativecalltimeofacertainuserisHowlongistheproblem?Ifthereare100,000usersintotal,100,000storageunitsareneededtostorethisinformation.Theattributesofdatastreamdataaremostlyrelatedtogeographicinformation,IPaddresses,mobilephonenumbers,etc.,andareoftenassociatedwithtime.Atthistime,thedimensionalityofthedatafarexceedsthecapacityofthememoryandharddisk,whichmeansthatthesystemcannotcompletelystorethisinformation,andusuallycanonlyaccessthedataoncewhenthedataarrives.

Tietojen saapumisaika – jatko

Thecontinuousarrivalofdatameansthattheamountofdatamaybeunlimited.Moreover,theresultofprocessingthedatawillnotbethefinalresult,becausethedatawillcontinuetoarrive.Therefore,theresultofthequeryonthedatastreamisoftennotone-timebutcontinuous,thatis,thelatestresultiscontinuouslyreturnedastheunderlyingdataarrives.

Thecharacteristicsoftheabovedatastreamdeterminethecharacteristicsofdatastreamprocessing:oneaccess,continuousprocessing,limitedstorage,approximateresults,andfastresponse.

Theapproximateresultisaninevitableresultproducedundertheconstraintsofthefirstthreeconditions.Sincethedatacanonlybeaccessedonce,andthereisonlyarelativelysmalllimitedspacetostorethedata,itisusuallyimpossibletogenerateaccuratecalculationresults.Afterchangingtherequirementsforresultsfrom"precise"to"approximate"inthepast,itbecomespossibletoachieverapidresponsetodatastreamqueries.

Luokittelu

Thenatureandformatofthedataaredifferent,andtheprocessingmethodofthestreamisalsodifferent.Therefore,intheJavainput/outputclasslibrary,therearedifferentstreamclassestocorrespondtodifferentNatureoftheinput/outputstream.Injava.Intheiopackage,thebasicinput/outputstreamcanbedividedintotwotypesaccordingtothetypeofreadandwritedata:bytestreamandcharacterstream.

Tulovirta ja lähtövirta

Datastreamisdividedintoinputstream(InputStream)andoutputstream(OutputStream).Theinputstreamcanonlybereadbutnotwritten,andtheoutputstreamcanonlybewrittenbutnotread.Usuallytheprogramusestheinputstreamtoreaddataandtheoutputstreamtowritedata,justasdataflowsintoandoutoftheprogram.Theuseofdataflowmakestheinputandoutputoperationsoftheprogramindependentofrelatedequipment.

Theinputstreamcangetdatafromthekeyboardorfile,andtheoutputstreamcantransmitdatatothemonitor,printerorfile.

Puskuroitu Stream

Inordertoimprovetheefficiencyofdatatransmission,Puskuroitu Streamisusuallyused,thatis,astreamisequippedwithabuffer(buffer),andabufferisdedicatedThememoryblockusedtotransferdata.Whenwritingdatatoabufferstream,thesystemdoesnotdirectlysendtotheexternaldevice,butsendsthedatatothebuffer.Thebufferautomaticallyrecordsdata.Whenthebufferisfull,thesystemsendsallthedatatothecorrespondingdevice.

Whenreadingdatafromabufferstream,thesystemactuallyreadsthedatafromthebuffer.Whenthebufferisempty,thesystemwillautomaticallyreaddatafromtherelevantdeviceandreadasmuchdataaspossibletofillthebuffer.

Mallin kuvaus

Wetrytosummarizeanddescribethedataflowmodelfromthreedifferentaspects:datacollection,dataattribuutteja,andcalculationtypes.Infact,manyarticleshaveproposedavarietyofdataflowmodels.Wedidnotincludeallthesemodels,butsummarizedandclassifiedthemoreimportantandcommonones.

Formalisointi

Seuraava onmuotoinen tietovirran kuvaus.

Harkitse vektoria α, sen attributedomainis[1..n](rankisn),andthestateofvektoriαattime

α(t)=

Ajoittain,αisazerovector,this,αi(s)=0foralli.Vektorin jokaisen osan päivitys on kaksituplavirran muodossa. Tämä tarkoittaa, että αi(t)=αi(t.1)+ct.i,for αi=ct.i )=αi.(t.1).Kysymys, jokatapahtuu α(t):lle.

Tiedonkeruu

Wefirstconsiderwhatdataisincludedinthecalculationrangewhenperformingdataflowcalculations.Regardingthisissue,therearemainlythreedifferentmodels:datastreammodel,slidingwindowmodelandn-of-Nmodel.

Tietovirtamalli(datavirtamalli)Tietovirtamallissa kaikki tiedot tietyltä ajalta on sisällytettävä laskenta-alueeseen.Tällä hetkellä s=0,eli,attime0,αisa0vektori.Tämä onalkuperä ja yleisin tietovirran malli.

Slidingwindowmodel(computingthemostrecentNdata)Theslidingwindowmodelmeansthat,countingfromthetimeofcalculation,theforwardNdatamustbeincludedinthecalculationrange.Atthistime,s=t.N,thatis,attimet.N,αisazerovector.Inotherwords,tocalculatethemostrecentNdata.Sincethedataofthedatastreamisconstantlyemerging,sointuitively,thismodeislikeusingaconstantwindow,thedatapassesthroughthewindowwiththepassageoftime,andthedatainthewindowisthecalculateddataset.M.Dataretal.[91]firstproposedthismodel,andthenreceivedawiderangeofresponses[92].

n-of-Nmodel(laskeviimeisimmät tiedot,joista0

dataattribuutteja

Itsensä ominaisuudet:

Aikasarja(aikasarjamalli)Tiedot tulevat määritteidensa järjestyksessä (todellinen aika). Tässä tapauksessa,i=t,thatis,anupdateattimetis(t,ct).Tällä hetkellä,αPäivitystoimintoonαt(t)=ct,andfori.=.t,αi.(t). 1).Tämä malli sopii aikasarjatiedoille, kuten tietyn IP-osoitteen lähteville tiedoille tai osakkeiden jaksoittain päivitetyille tiedoille jne.

Cashregistermodel(cashregistermodel)Thedataofthesameattributeisadded,andthedataispositive.Inthismodel,ct>=0.ThismeansForalliandt,αi(t)isalwaysnotlessthanzeroandisincreasing.Infact,thismodelisconsideredtobethemostcommonlyused,forexample,itcanbeusedforcashregister(cashregister)Themodelgetsitsname),thenetworktransmissionvolumeofeachIP,themonitoringofthecalldurationofmobilephoneusers,andsoon.

Theturnstilemodel(turnstilemodel)Thedataofthesameattributeisadded,andthedataispositiveorNegative.Inthismodel,ctcanbegreaterthan0orlessthan0.Thisisthemostcommonmodel.S.Muthukrishnan[89]calleditaturnstilemodelbecausethefunctionofthismodelislikethecrossofasubwaystation.Turnstilescanbeusedtocalculatehowmanypeoplehavearrivedandleft,andthusthenumberofpeopleinthesubway.

Laskentatyypit

Thecalculationofdatastreamdatacanbedividedintotwocategories:Basiccalculationsandcomplexcalculations.Basiccalculationsmainlyincludepointquery,rangequeryandinnerproductquery.Complexcalculationsincludequantilecalculation,frequentitemcalculation,anddatamining.

Pistekysely palauttaa arvonαi(t).

RangequeryForrangequeryQ(f,t),return

t

.αi(t)

i=f

InnerproductForvectorβ,innerproductofαandβ

α.β=Σni=1αi(t)βi

Quantile(Quantile)Givenasequencenumberr,returnthevaluev,andensurethattherealrankrofvinαmeetsthefollowingrequirements:

r.εN≤r.≤r+εN

Niistä εontarkkuus,N=Σni=1αi(t).

GSMankuetc.[94]providesaframeworkstructureforapproximateestimationofquantilesthroughascan,andtreatsthedatasetasthenodesofthetree.Thesenodeshavedifferentweights(suchasthenumberofdatacontainedinthenode).Itisbelievedthatallquantileestimationalgorithmscanbeconsideredtobecomposedofthreeoperationsonnodestogeneratenewnodes(NEW),merge(COLLAPSE)andoutput(OUTPUT).Differentstrategiesconstitutedifferenttypesoftrees.Thisframeworkstructurebecamethebasisofmanysubsequentquantileestimationalgorithms.

FrequentitemsaresometimescalledHeavyhitters,whichmeansfindingitemsthatfrequentlyappearinthedatastream.Inthiscalculation,actuallyletct=1.Inthisway,αi(t)storesthearrivalfrequencyofdatawhosedimensionvalueisequaltoiasoftimet.Thequeryofthesedatacanbedividedintotwotypes:

Etsiensimmäisetk useimmin esiintyvät kohteet

Etsi kaikki kohteet, joiden taajuus on suurempi kuin 1/k

>

Theresearchonthefrequencytermmainlyfocusesonthelattercalculation[95].

Tietovirran louhinta sisältää monimutkaisempia laskelmia. Tämän alueen tutkimus sisältää: moniulotteisen analyysin[96], luokitusanalyysin[97,98], klusterianalyysin[99–102] ja muita passalgoritmeja[103].

Aiheeseen liittyviä ideoita

Johdanto

Themaindifficultyindatastreamprocessingishowtocontrolthespacespentstoringdatawithinacertainrange.Althoughthequestionofqueryresponsetimeisalsoimportant,itisrelativelyeasytosolve.Asahotspotintheresearchfield,datastreamprocessinghasbeenextensivelystudied,andmanyalgorithmshaveemerged.

Oneideatosolvethecontradictionbetweenthehugeamountofdatainthedatastreamandthelimitedstoragespaceistousesampling.AnotherideaistoconstructasmalldatastructurethatcanprovideapproximateresultstostorecompressedDatastreamdata,thisstructurecanbestoredinmemory.Sketch,histogram,andwaveletareactuallythemostimportantthreeofsuchdatastructures.

Infact,mostoftheabovemethodshavebeenusedinthefieldoftraditionaldatabases.Theproblemishowtoapplythemtothespecialenvironmentofdataflow.

Satunnaisotos

Satunnaisotoscancapturethebasiccharacteristicsofadatasetbydrawingasmallnumberofsamples.Averycommonandsimplemethodisuniformsampling.Asanalternativesamplingmethod,strati.edsamplingcanreduceerrorscausedbyunevendatadistribution.However,forcomplexanalysis,ordinarysamplingalgorithmsstillrequiretoomuchspace.

Forsomespecialcalculationsofdatastreams,someinterestingsamplingalgorithmshaveappeared.Stickysampling[95]isusedforthecalculationoffrequentitems.ThemethodofstickysamplingistostorethesetSformedbythetwo-tuple(i,f)inthememory.Foreachpieceofdatathatcomes,ifthekeyialreadyexistsinS,thecorrespondingfisincreasedby1;otherwise,Samplingisperformedwithaprobabilityof1r.Ifthisitemisselected,agroup(i,1)isaddedtoS;afteraperiodoftime,thegroupinSisscannedonceandthevalueisupdated.Thenincreasethevalueofr;attheend(ortheuserrequeststheresult) ,outputallgroupsoff.(s-e)N.

Thedistinctsampling[104]proposedbyP.Gibbonsisusedfordistinctcounting,thatis,tofindthenumberofdifferentvalues​​inthedatastream.Itusesahashfunctiontomapeachdifferentvaluethatarrivestoleveliwithaprobabilityof2.(i+1);ifi≥memorylevelL(theinitialvalueofLis0),addittomemory,Otherwisediscard;whenthememoryisfull,deletethevalueoflevelLinthememory,andadd1toL;thefinalestimateofthedistinctcountisthedifferentvalueinthememorymultipliedby2L.Distinctcountingisanoldproblemindatabaseprocessing.Theadvantageofthisalgorithmisthatbysettingappropriateparameters,itcanbeappliedtoquerieswithpredicates(thatis,distinctcountingisperformedonasubsetofthedatastream).

Thedisadvantageofsamplingalgorithmsisthattheyarenotsensitiveenoughtoabnormaldata.Moreover,eveniftheycanbewellappliedtocommondataflowmodels,theyneedtobemodifiediftheyaretobeusedinslidingwindowmodels[91]orn-of-Nmodels[93].

Rakenteen luonnos

Sketchingreferstotheuseofrandomprojectionstoprojectthedatastreamintoasmallstoragespaceasasummaryoftheentiredatastream.Thesummarydatastoredinspaceiscalledathumbnail,whichcanbeusedtoapproximateanswerstospecificqueries.DifferentsketchescanbeusedtoestimatedifferentLpnormsofthedatastream,andtheseLpnormscanbeusedtoanswerothertypesofqueries.Forexample,theL0normcanbeusedtoestimatedistinctcountsofdatastreams;theL1normcanbeusedtocalculatequantilesandfrequentitems;theL2normcanbeusedtoestimatethelengthofself-connections,andsoon.

TheconceptofsketcheswasfirstproposedbyN.Alonin[105].Sincethen,varioussketchesandtheirconstructionalgorithmshavecontinuouslyemerged.

TherandomizedstechingproposedbyN.Alonin[105]canbeusedfortheestimationofdifferentLpnorms,andrequiresatmostO(n1.lgn)space.ThemoreimportantcontributionofthispaperisthatitcanalsoestimateL2withaspacerequirementofO(logn+logt).ItsmainideaistouseahashfunctiontoconsistentlyandrandomlymapeachelementinthedomainDofthedataattributetozi∈{.1+1},sothattherandomvariableX=.iαizi,X2canbeusedasEstimateofL2norm.

p1

ThequantilesketchproposedbyS.Guhaetal.[88]maintainsasetofdatastructureslike(vi,gi,Δi),rmax(vi)andrmin(vi)arethemaximumandminimumpossiblerankingsofvi,respectively.Fori>j:

vi>vj

gi=rmin(vi).Rmin(vi.1)

Δi=rmax(vi).rmin(vi)

Withthearrivalofthedata,updatetheoutlineaccordinglytokeeptheestimationwithinacertainaccuracy.X.Linetal.[93]gaveamoreformaldescriptionofthisproblem.

IfASisarandomsetextractedfrom[1..n],theprobabilityofeachelementbeingextractedis1/2.A.Gilbertetal.[106]constructseveralASs,andcallthesumofelementvalues​​ineachsetarandomsum.Multiplerandomsumsmakeupasketch.Theestimationofαiis

2E(||AS|||αi∈AS).||A||,where||A||isthesumofallthenumbersinthedatastream.Therefore,thiskindofthumbnailcanbeusedtoestimatetheresultofapointquery.Usingmultiplesuchthumbnailscanbeusedforestimationrangequery,quantilequery,etc.Thesketchingtechniqueisactuallytheresultofatrade-offbetweenspaceandaccuracy.InordertoensurethattheerrorofthepointqueryresultislessthanεN,thespacerequiredfortheabovesketchisusuallyε.2asthecoefficient.Incomparisonwiththis,theCount-MinSketchproposedbyG.Cormodeetal.[19]onlyneedsspacefortheε.1coefficient.Theideaisalsorelativelysimple.Useseveralhashfunctionstoprojectseparatedatastreamsontomultiplesmallthumbnails.Whenansweringapointquery,eachthumbnailisansweredseparately,andthesmallestvalueisselectedastheanswer.Basedonpointquery,count-minimumoutlinecanbeusedforvariousotherqueriesandcomplexcalculations.Thecount-minimalsketchdoesnotcalculatetheLpnorm,butdirectlycalculatestheresultofthepointquery,whichisoneofthereasonswhyitsspace-timeefficiencyishigherthanothersketches.

Histogrammi

Thehistogram(histogram)hastwomeanings:oneisahistogramintheordinarysense,whichisavisualmeansfordisplayingapproximatestatistics;inaddition,itItisalsoadatastructure/methodthatcapturestheapproximatedistributionofdata.Whenappearingasthelatter,thehistogramisconstructedlikethis:thedataisdividedintomultipledisjointsubsets(calledbuckets)accordingtoitsattributes,andthevalues​​inthebucketsareapproximatedinaunifiedway[107].

Thehistogrammethodismainlyusedforsignalprocessing,statistics,imageprocessing,computervisionanddatabase.Inthedatabasefield,thehistogramwasoriginallymainlyusedforselectivityestimation,forselectionqueryoptimizationandapproximatequeryprocessing.Histogrammiisoneofthesimplestandmostflexibleapproximateprocessingmethods,anditisalsothemosteffectiveone.Aslongasthedataupdateproblemissolved,theoriginalhistogramcanbeusedindatastreamprocessing.Thistypeofhistogramthatisautomaticallyadjustedaccordingtothenewdataiscalledadynamic(oradaptive/self-adjusting)histogram.

ThehistogramproposedbyL.Fuetal.[108]ismainlyusedforthecalculationofthemedianfunction(Median)andotherquantilefunctions.Itcanbeusedforapproximatecalculationsandaccuratequeries.ItusesDeterministicBucketingandRandomizedBucketingtechnologiestoconstructmultiplebucketswithdifferentprecisions,andthendividetheinputdataintothesebucketsstepbystep,thuscompletingthedynamichistogramstructure.

Becauseitisdifficulttodirectlyapplystatichistogramstodatastreamprocessing.S.Guhaetal.[88]candynamicallyconstructnear-optimalV-optimalhistograms,buttheycanonlybeappliedtodatastreamsundertimeseriesmodels.

Acommonlyusedmethodistodividetheentirealgorithmintotwosteps:firstconstructasketchofthedataflowdata;thenconstructasuitablehistogramfromthissketch.Thismethodcantakeadvantageoftheeasyupdateofthethumbnaildataandrealizethedynamicsofthehistogram.N.Thaperetal.[109]firstconstructedasketchthatapproximatelyreflectsthedatastreamdata,andusedtheexcellentupdateperformanceofthesketchtoupdatethedata,andthenderivedahistogramfromthissketchtoapproximatethedatastreamdata.SincederivingthebesthistogramfromthesketchisanNP-hardproblem,theauthorprovidesaheuristicalgorithm(greedyalgorithm)tosearchforabetterhistogram.

A.Gilbertetal.[110]constructedasummarydatastructurethatusesasetofrandomandstructuresimilartothoseintheliterature[106]tostorethevalues​​ofdyadicintervalatdifferentgranularitylevels.Subsequently,thedyadicinterval([111])ofdifferentgranularitylevelsisaddedtothehistogramtobeconstructedfromlargetosmall,soastominimizetheapproximateerror(refinement).

A.Gilbertetal.[112]mainlyconsideredhowtoreducetheprocessingcomplexityofeachinputdatainthedatastream.Theyfirstconvertedtheinputdataintowaveletcoefficients(usingthewaveletcoefficientsastheinnerproductofthesignalandthebasisvector),andthenadoptedadyadicintervalprocessingmethodsimilartotheliterature[110].Thesketchiscloselyrelatedtothehistogram.Fromacertainperspective,thehistogramcanberegardedasaspecialcaseofthesketch.

Wavelet Transformation

Wavelettransformation(wavelettransformation)isoftenusedtogeneratesummaryinformationofdata.Thisisbecauseusuallyonlyasmallpartofthewaveletcoefficientsisimportant,andmostofthecoefficientsareeitherverysmallorunimportant.Therefore,ifyouignoretheunimportantcoefficientsgeneratedbythedataafterthewavelettransform,youcanuseverylittlespacetocompletetheapproximationoftheoriginaldata.

Y.Matiasetal.firstconstructedahistogramforthedatastreamdataandsimulateditwithwavelet.Subsequently,someofthemostimportantwaveletcoefficientsareretainedtosimulatethehistogram.Whennewdataappears,thehistogramisupdatedbyupdatingthesewaveletcoefficients.

Whattheliteratureproposesisactuallyahistogrammethod,butituseswavelettransform.A.Gilbertetal.pointedoutthatthewavelettransformcanbeconsideredastheinnerproductofasignalandasetoforthogonalvectorsoflengthN.Therefore,asetofdatastreamdataoutlinesareconstructed.Becausetheoutlinescancalculatethesignalandasetofdataeasilyandaccurately.Theinnerproductofthegroupvectorcanthenbeusedtocalculatethewaveletcoefficientsfromthesketch,whichcanbeusedforpointqueryandrangequeryestimation.

NewTrends

Researchershavecontinuedtodeepentheirresearchondatastreamprocessing.Webelievethatthefollowingnewtrendshaveemerged:

Tulevaisuuden luonnokset

b>

Esittele enemmän tilastotietoja

Laskentatekniikka luonnosten rakentamiseksi

G.Cormodeandothersmainlydealwiththecalculationoffrequentitems.Itisbasedonthepreviousmajoritemalgorithm([116,117])anduseserror-correctingcodestodealwithproblems.Forexample,acounterissetupforeachbitofthedata,andthenthefrequentitemsetisinferredbasedonthecountingresultsofthesecounters.

Y.Taoetal.[118]isessentiallyanapplicationofProbabilisticcounting(distinctcountingthathasbeenwidelyusedinthedatabasefield)indatastreamprocessing.

Luonnoskartan laajentaminen

Laajenna luonnoskartta monimutkaisemmilla kyselyillä.

Linetal.intheliterature[93]constructedacomplexsketchsystemthatcanbeusedtoestimatethequantileoftheslidingwindowmodelandthen-of-Nmodel,whichisdifficulttoachievewithsimplesketches.

Undertheslidingwindowmodel,literature[93]dividesthedataintomultiplebucketsinchronologicalorder,establishesthumbnailsineachbucket(theaccuracyishigherthanrequired),andthencombinesthesethumbnailsduringqueryMerge,wherethelastbucketmayneedtobelifted.Duringmaintenance,onlyexpiredbucketsaredeletedandnewbucketsareadded.

Inthen-of-Nmodel,literature[93]dividesthedataintomultiplebucketsofdifferentsizesaccordingtotheEHPartitioningtechnique,andbuildsasketchineachbucket(theaccuracyishigherthanrequired),Thenmergesomeofthethumbnailsduringthequerytoensuretherequiredaccuracy,andthelastonemayneedtobeimproved.

Yhdistää spatiotemporaldata

Furthercombinationwithspatiotemporaldataprocessing:

J.Sunetal.[120]Mainlyforhistoricalqueryandpredictionprocessingofspatio-temporaldata.However,thearticleemphasizesthatspatio-temporaldataappearsintheformofdatastreams,andtheprocessingalsofocusesmoreontheupdateperformanceofspatio-temporaldata.

Y.Taoetal.[118]usethedatastreammethodtoprocessspatio-temporaldata.Byconstructingasketchofthedynamicspatio-temporaldata,itisusedtodistinguishwhethertheobjectismovingorstationaryamongmultipleregions,andestimateItsnumber.Butthiskindofproblemisdifficulttosolveintheoriginaltimeandspaceprocessing.

Novelgenre

Thedatastreamofonlinenovelsisanemerginggenre,whichmeansthattheprotagonist'sstrengthisdigitized,andthedatadisplayedisthesameastheattributebarofonlinegames.

Related Articles
TOP