Best-Ever Algorithm Found for Huge Streams of Data
It’s hard to measure water from a fire hose while it’s hitting you in the face. In a sense, that’s the challenge of analyzing streaming data, which comes at us in a torrent and never lets up. If you’re on Twitter watching tweets go by, you might like to declare a brief pause, so you can figure out what’s trending. That’s not feasible, though, so instead you need to find a way to tally hashtags on the fly.
Computer programs that perform these kinds of on-the-go calculations are called streaming algorithms. Because data comes at them continuously, and in such volume, they try to record the essence of what they’ve seen while strategically forgetting the rest. For more than 30 years computer scientists have worked to build a better streaming algorithm. Last fall a team of researchers invented one that is just about perfect.