An activity that seeks patterns in a continuous stream of data elements, usually involving summarizing the stream in some way.
Questions tagged [data-stream-mining]
23 questions
12
votes
2 answers
Opensource tools for help in mining stream of leader board scores
Consider a stream containing tuples (user, new_score) representing users' scores in an online game. The stream could have 100-1,000 new elements per second. The game has 200K to 300K unique players.
I would like to have some standing queries like:…
Tahir Akhtar
- 315
- 2
- 9
6
votes
1 answer
Which Big Data technology stack is most suitable for processing tweets, extracting/expanding URLs and pushing (only) new links into 3rd party system?
(Note: Pulled this question from the list of questions in Area51, but believe the question is self explanatory. That said, believe I get the general intent of the question, and as a result likely able to field any questions on the question that…
blunders
- 1,922
- 2
- 15
- 19
4
votes
2 answers
Choosing between Storm+Trident-ML, Storm+SAMOA or Spark Streaming+MLlib
I want to implement Streaming Naive Bayes in a distributed system. What are the best approach to choose framework. Should I choose:
Storm alone and implement streaming naive bayes on my own in storm topology.
Storm + TridentML
Storm + SAMOA
Spark…
Raman
- 141
- 3
4
votes
1 answer
Real time noise removal using Savitzky-Golay Method
I would like to ask if Savitzky-Golay can be implemented on real-time data.
I have used it on a fixed array size, but would like to extend it to output values for real-time sensor data. Can anyone refer me to appropriate implementation or hint…
Abdullah Nazir
- 161
- 1
- 2
3
votes
0 answers
High dimensional data stream summarization and processing
Can anyone recommend a method for summarizing and processing high dimensional data streams efficiently and effectively for anomaly detection?
In fact, I investigated the different methods for data stream summarization (sampling, histograms,…
I Sui
- 57
- 3
3
votes
1 answer
Designing a ConvNet to facilitate game playing
For fun I want to design a convolutional neural net to recognize enemy NPCs in a first person shooter. I have captured 100 jpegs of the npcs as well as 100 jpegs of not-NPCs. I have successfully trained a really simple convNEt to identify NPCs. This…
aquagremlin
- 133
- 4
2
votes
1 answer
What are the approaches to aggregate categorical variables?
I am working on a clickstream dataset. I have come up with the following example dataset to explain my problem:
ClickTimeStamp | SessionID | ART_weekOfYear | PagenameClicked | TimeSpentPerSession | CustID | ContractID | ... | TARGET…
Amir
- 123
- 1
- 6
2
votes
0 answers
Is there a counting sketch optimized for intersections?
Popular counting sketches(loglog, hyperloglog, etc) feature natural union operations. Are there any known counting sketches that feature natural intersection operations?
Newbie
- 121
- 3
2
votes
1 answer
What is the differenc between Real concept drift, virtual concept drift and feature drift
As far as I know, the real concept drift is caused by changes in the decision boundary while virtual drift occurs because of changes in data distribution. Some researchers mention that virtual drift can be denoted as feature change.
Is my…
Imen F
- 21
- 3
2
votes
1 answer
Newbie questions: real-time clustering of messages
I'm very much a newbie in NLP, so please accept my apologies if this is an obvious question, the wrong place to ask it or any other error I could be making.
I am considering using NLP for some subset of real-time spam detection in real-time chat.…
Yoric
- 121
- 3
2
votes
0 answers
local regression with streaming data
From a data stream i'm receiving a pair of measurements consisting of a current consumption and a current percentage every second. By accumulating the consumption over time it will represent eventually the maximum capacity when the percentage…
R. Doe
- 251
- 2
- 6
2
votes
1 answer
Analysis of Real-Time Bidding
I'm totally new to the topic of real-time bidding in which I know Machine Learning algorithms are used pretty often.
Can somebody explain me the system in a plain language i.e. a language for a non-technical person?
What is the bidding? Who bids on…
DanielWelke
- 163
- 1
- 10
1
vote
1 answer
reduction of sample from videos sample
Well, I post the same question in the main stack before finding the right place, sorry.
A friend of mine is working with more than a 100 videos as sample for his neural network. Each video last more than a couple of minutes with around 24 frames per…
T.Dunglas
- 23
- 2
1
vote
0 answers
modelling multirotor aerodynamics using datalogs of flights
I am trying to find a vector that would describe the effects of wind on a multirotor. I have a bunch of datalogs from a single frame of multirotor and am of the mind to digg.
The idea is that during flight a multirotor has 2 vectors to fight…
Karl Uibo
- 111
- 2
1
vote
2 answers
Online learning w/ feature weighting/adjusting
Let's say I have a supervised learning problem with a sequence of features and labels. First, I learn on the training data and then I decide to stream in data, point by point and do online learning. Is it possible to update the weights or figure out…
Jeremy
- 13
- 2