Big Data or The Force? A Star Wars night at Data Driven night

NEW YORK–Would you rely on Big Data or The Force? It was a Star Wars evening for the Data-Driven meetup last December 14 at Bloomberg, especially for Nick Mehta, CEO of Gainsight who sounded giddy using the epic fantasy flick as reference for his presentation. He was joined by Arcadia Data, MapR and Datameer.

Gainsight offers a platform for teams to coordinate more efficiently, track business outcomes and streamline operations.

Because Methta considers data from four years ago to be entirely useless, he has some suggestions: “Focus on recent data (data quality), analyze by segment (data variance), understand models, track and automate actions and solve for leading metrics (data points),” addding how he focuses on helping customers get the right data.

Everybody was in a light mood at the meetup now with 10,000 members, now one of the biggest tech meetup groups. Even Shant Hovespian of Arcadia Data teased host Matt Turck for his uncanny resemblance to Ryan Seacrest, letting the 400 attendees a good look of the French man’s photo beside that of the TV and radio personality.

Arcadia Data builds Unified Visual Analytics and BI Platform for big data as its way of connecting business users to Hadoop. It unifies visual exploration and back-end data analytics in one integrated enterprise platform that runs natively on your Hadoop cluster.

It also converges the visual, analytics and data layers to provide accelerated access to all data stored within Hadoop, and support net-new analytics on granular datasets.

In creating business value from Big Data, Hovespian pointed out 10 commandments. These included “thou shalt not move Big Data; it’s expensive. It’s big….Push all the computation down close to the data.”

He added, “ODBC/JDBC connectors aren’t always enough but be careful having to extract data out to data marts & cubes.”

Having to extract data out of the system is slow and defeats the purpose of having a specialized architecture. On-cluster BI (business intelligence) is now possible.

The second commandment for him: “Thou shalt not steal or violate big data….“Security is serious. All the serious Big Data from vendors have implemented some form of security, your BI tool should support it,” he said.

A third commandment, “thou shalt not pay for every user or gigabyte,” asks for common sense. “Big data is cost effective if it’s used properly. Be wary of pricing models that penalize you for increased adoption.”

Other commandments for “analyzing thine data,” and “not waiting for results.”  He said, ”Build an OLAP cube. Create temp tables. Take samples of the data.”

Overall, he suggests finding “the right balance if you’re dealing with several pain points.”

M.C. Srivas of MapR had a lot of interesting things to say. “How much data do self-driving cars produce? 1 terabytye per hour,” he said.   

Srivas said it took him to two years to build MapR, which delivers business-critical production success using the MapR Converged Data Platform, 100 percent binary compatible with the Apahe Hadoop distributed file system to ensure plug-and-play compatibility and no vendor lock-in.

“Hire people not like you. Not from the same DNA,” he said.

A core module of the MapR Platform is MapR-GS, a modern, read-write capable, NFS-mountable distributed file system written in C++ that directly accesses storage hardware. It’s built to process both distributed files, database tables, and event streams in one unified layer This enables companies to support operational and analatic apps in one cluster, which can reduce costs as one grows big data deployment.

Stefan Groschupf of Datameer is lowering the barrier to big data analytics. The company aims to make it easy for everyone.

If it sounds intimidating, one can say the company sees big data as a way to end world hunger and solution to complex busness problems.

Reviewing 2015, Moat looks forward to measuring viewability on YouTube in 2016

NEW YORK–Last Dec 9, Uncubed took the holiday season as an opportunity for startups like Moat to discuss their 2015 accomplishments and future plans at its offices in the Lower East Side. By 2016, Moat, an independent SaaS Marketing analytics firm focused on transforming online brand advertising through trusted measurement and analytics, will reportedly be the first third party to measure viewability on YouTube.

Several leading publishers like Verve, agencies, platforms and social networks have already adopted Moat Analytics as a way for them to maximize digital advertising results through viewability tracking as well user and advertising experiences. Twitter is onboard as well.

Founded in 2010, the company develops technologies and products for brand advertisers and premium publishers. Now it measures in-app mobile viewability, increasing efficiency and transparency across mobile campaigns.

“Advertising has become so complex. We can determine if ad viewing is human or not,” said Ryan Rende of Moat. “Measuring ad effectiveness in most devices is important.”

For those who don’t know what Moat offers, it claims to know how long people watch an ad online and what you can do with that knowledge. “If you watched 4 seconds of ad then perhaps you can just say 4 seconds is an effective ad,” he said.

It does this by injecting snippets of code inside an ad.  The amount of data the company processes every day is now 30 terabytes a day.

Last October, the company was accredited by the MEdia Rating Council (MRC) for mobile viewability for its measurement of viewable ad impressions and relative viewable metrics in both mobile web and mobile in-app.

Moat also provides 100+ real-time attention signals, including in-view impression measurement, exposure time, interaction and other metrics at the campaign, creative, domain and impression levels. It’s been a good year for Moat.

Can predictive analytics make data scientists more productive?

NEW YORK–How do you make data scientists more productive? Jeremy Achin has an answer for you.

The current path to becoming a data scientist is based on learning statistics, programming and algorithms, then applying practical knowledge and practicing real world experience which can unfortunately take up a lot of time.

The better way, he insists, is automated using modern tools and computational power where you can go dive right away into your practical knowledge and real world experience and then just add if you want, statistics, programming and algorithms later.

Achin was talking about his company DataRobot, which he said offers predictive analytics fast. By fast, Achin believes DataRobot can cut down time it takes for a data scientist to solve a problem in hours rather than months.

“People take months to manually build regression models. There are technique-agnostic ways to assess and interpret predictive models,” he said.

Achin spoke with other presenters Josh Bloom of, Alexi Le-Quoc, founder of Datadog and Haile Owusu, chief data scientist of Mashable at Data-Driven’s monthly meetup last November 16 at Bloomberg.

Where DataRobot is about speed, Wise is about easy–making machine learning easy focused on letting users build and deploy models for predicting customer behavior. It’s interesting to note here that the founders were from University of California, Berkeley, astrophysics professors and researchers who have worked together for over a decade.

Today, it is pushing the limits of cutting-edge machine learning technology for customer success. It offers a host of intelligence routing/triage, response recommendation, auto response, knowledge-base deflection and many more.

In its presentation, Datadog showed how its cloud service helps customers monitor infrastructure and software.  

Datadog gathers performance metrics from your application comp; it visualizes and pull in data real time, and alerts because your understand is only as good as your monitoring.

Founded 2010, the company raised $31 million primarily from Index Ventures early this year.

Smith of Qualtrics, Medlock of Swiftkey draw huge crowd at Data-Driven meetup

data-driven meetup2

By Dennis Clemente

NEW YORK—CEO and founder Ryan Smith of Qualtrics spoke candidly about his beginnings in Ohio when he set up his company with his academician-father, ran it in his basement for five years and how he knocked on university doors in New York to offer his service to academicians at first last March 17 at the Data-Driven meetup at Bloomberg’s offices. After 12 years, the company is now valued at $2.1 billion.

“We built a product together that was simple enough for me, and sophisticated enough for him (his father),” he said of his product that makes sophisticated research simple.

In a sit-down talk with host Matt Turck, Smith talked about how many companies sometimes miss the forest for the trees. He remembers being asked so many business questions when almost always, it would have been better to ask “your employees and customers.”

The Data-Driven meetup is a mix of both presentations and sit-down talk in a span of an hour and a half.
CTO and co-founder Ben Medlock of Swiftkey chose to talk next about artificial intelligence in general as it relates to the future of mobile typing, the way it’s building the world’s smartest keyboard.

Ben Medlock of Swiftkey talked about his smart prediction technology for easier mobile typing. “Swiftkey is a narrower AI company,” he said of his company designed in 2010, with close to 10 billion users today and 50 trillion characters written down.

“How can we model how we think?” asked his audience.

Swiftkey is building language models among other things based on fast and efficient smoothed n-gram models ; optimized trie search; morphemes and neutral nets/representation learning

For input modeling, it uses Gaussian distributions to model interaction with the keyboard surface and linear gaussians. As for data collection, it has partnered with a UK-based company.

CEO Paul Dix, for his part, presented how Influx DB works

InfluxDB is a time series, metrics, and analytics database. It’s written in Go and has no external dependencies. Once you install it, you don’t need to install Redis, ZooKeeper, HBase, or whatever.

InfluxDB is targeted at use cases for DevOps, metrics, sensor data, and real-time analytics.

“It arose from our need for a database like this on more than a few previous products we’ve built,” Dix said.
Dix announced plans to launch the testing build of version. 0.9.0 in a few months. Some new features will include support for tags and API changes. InfluxDB currently supports the following:

• SQL like query language
• Storage of billions of data points
• Database managed retention policies for data
• Built in management interface
• Aggregation on the fly

“It’s (InfluxDB) is a discovery engine for what you are collecting,” Dix said.

CEO Ion Stoic sat down to discuss the history of Databricks which was founded by the creators of Apache Spark.

Numberfire predicts sports winners through its analytics

By Dennis Clemente

Who wants to be a millionaire? Nik Bonaddio did when he won $100,000 on the TV game show and launched Numberfire. That’s the way to get funded without going the VC route.

It’s a great story that Numberfire COO Adam Kaplan liked telling his audience last September 29 at the New York Sports Tech Meetup sponsored by GameChanger in downtown Manhattan. He also took the opportunity to announce the release of its app.

Numberfire has since been working with the likes of ESPN and FIFA, providing unstructured data and leveraging mathematical modeling to mine it for insight that predicts players and team performance.

It’s a long way from ex-jocks giving their own forecasts.

“It’s not based on emotion. It’s quantitative and based on rigorous mathematical modeling. Calculated and delivered on demand,” Kaplan said.

How does it all work? He said Numberfire ingests live data and regression modeling.

An analogy that Bonaddio likes to use from his past interviews is the common cold. You know when you’re going to get a cold. In sports, it can be the same way.

The data Numberfire uses to make projections is reportedly of public record.

Today, Numberfire offers analytics for the NFL, NBA and other sports organizations like the FIFA World Cup where it also leveraged its analytic capabilities in real time.

Numberfire’s monetization model is based on subscription content services and native display ads in various devices.

“We turn analytics into multiplatform products that deliver engagement, revenue and positive user experiences,” he said.

The meetup was also co-organized by Stainless Code. It uses advanced semantic technology to allow easy integration of their metadata logging tools in real-time video workflows. Current clients include Major League Baseball and Turner Sports.

Sponsors of the meetup were GameChanger and SportsData. GameChanger provides scorekeeping, stats, live GameStream and recap stories for thousands of amateur teams. SportsData, subsidiary of Sportradar, provides real-time scores, stats, play-by-by, and other sports info for 40+ sports, 800+ leagues, and 200,000+ events.

Deepening consumer engagement with Tapad’s mobile solution

data-driven photo

By Dennis Clemente

How can you make devices talk to each other, so they’re all in sync, even your behavior?

Attempting to understand consumer behavior across related screens and the ability to reach the right people on the right device at the right time is not easy.

But Tapad and its proprietary technology may be the key. Dag Liodden and George Gemelos, co-founder ad CTO and SVP of Data Science, claim publishers and advertisers can now deepen consumer engagement with a more fluid experience while increasing campaign cost-effectiveness.

Liodden and Gemelos were at the Data-Driven NYC meetup at the Bloomberg office last September 15 along with Tamr (data connection platform), Sumo Logic (turns machine data into smart decisions) and Panorama Education (data analytics to evaluate the effectiveness of education systems).

Tapad’s relevance these days could not be downplayed. Who doesn’t want their multitude of devices in sync somehow? Its proprietary technology reportedly assimilates billions of data points to find the human relationship between smartphones, desktops, laptops, tablets, connected TVs and game consoles.

Organizations with a large and growing number of data sources can certainly benefit from these companies as they claim to offer value for their services.

Ilyas of Tamr said most organizations use less than 10% of the relevant data available. The cost and complexity of connecting and preparing the massive variety of internal and external data required to power analytics and applications are unacceptably high. Tamr reportedly combines machine learning and advanced algorithms with human insight to identify data sources, understand relationships and curate siloed data at scale.

How did Tamr develop its technology? The concept and technology behind the Tamr platform reportedly began as a research project. After two years of product development, commercialization and deployment with customers, Tamr publicly announced the platform in the spring of 2014.

Tamr helps find, connect and enrich all of an organization’s data sources. The platform’s algorithms can reportedly analyze and determine which attributes to match–often handling over 90% automatically.

How does Tamr handle data security? Ilyas said Tamr requires as complete a view as possible of the data. Tamr will propagate security-relevant metadata to its output, enabling an organization’s existing security mechanisms to protect the data as needed.
How do organizations get experts to participate in expert sourcing?

Designed from the ground up as a cloud-based service, CEO Vance Loiselle said Sumo Logic reduces the TCO of log management and analytics through simplified provisioning. It liberates enterprises from having to manage on-premise systems, costly network attached storage and/or storage area networks, and puts an end-to add-on hardware costs and software upgrades.

Sumo Logic handles all log data collection, processing, storage, forensics and analysis from a centralized and highly secure cloud-based platform.

The Sumo Logic service leverages the scalability, reliability, redundancy and durability of Amazon S3. This enables the Sumo Logic service to provide customers with quality-of-service for log data retention at an extremely competitive price point compared to customers building their own highly available, disaster-recoverable storage arrays.

Tobi Knaup, founder and CTO of Mesosphere has Mesos to offer. It makes running complex distributed applications reportedly easier. Most applications now run on distributed systems, but connecting all of the distributed parts is often still a manual process. Mesos’ job is to abstract away all of these complexities and to ensure that an application can treat the data center and all your nodes as a single computer. Instead of setting up various server clusters for different parts of your application, Mesos creates a shared pool of servers where resources can be allocated dynamically as needed.

Current Mesos users include the likes of Airbnb, Vimeo, Hubspot and Twitter.

Aaron Feuer of Panorama Education is doing a social good. It is helping America’s leading schools collect, analyze, and act on feedback using student, teacher, and parent surveys. It is reportedly working with districts and states collecting feedback and analyzing data of more than millions of students.