It’s no secret that a significant proportion of the needs for big data have come from the explosion in Internet technologies. Up until 10-20 years ago, the idea of a public-facing application having more than a few million users was unheard of. Today, even a modest website can have millions of users, and if it’s active, can generate millions of data items every day. The irony is that the very infrastructure and systems that create big data can also work in reverse, and provide some of the better ways to integrate and work with that data. Usefully, InfoSphere® BigInsights™ comes with support for managing and executing data jobs through a simple REST API. And through the Jaql interface, we can run queries and get information directly from a Hadoop cluster. This article looks at how these systems work together to give you a rich basis for capturing data and provide an interface to get the information back out again.
Building flexible apps from big data sources.
I’ve got a new article, which is part of a new three-part series, on moving data between SQL and Hadoop, both the export to Hadoop and importing processed content back into an SQL store.
In this first one, we look at the basic mechanics and considerations before you start the migration of data, such as the data format, content, and export techniques.
Read: SQL to Hadoop and back again, Part 1: Basic data interchange techniques
As databases evolve, learning how to get the best out of the different solutions out there is the key to understanding and extracting the data in the way you need from your required data store. Document databases, like MongoDB, CouchDB, Couchbase Server and many others provide a completely different model and set of problems for interfacing and extracting data.
You need to be able to understand your structure, how you can query the information, and how to perform different data mining techniques on what is very obviously a completely different structure of information.
In this article, I try to take you through the basics of data mining when using a document database.
Read: Data mining in a document world
I have a new article on the basics of data mining techniques so that you can better understand some of the key principles behind the different methods and principles of data mining.
From the abstract:
Many different data mining, query model, processing model, and data collection techniques are available. Which one do you use to mine your data, and which one can you use in combination with your existing software and infrastructure? Examine different data mining and analytics techniques and solutions, and learn how to build them using existing software and installations. Explore the different data mining tools that are available, and learn how to determine whether the size and complexity of your information might result in processing and storage complexities, and what to do.
Read: Data Mining Techniques
My latest article on performing predictive modeling using document databases is now available on IBM developerWorks. The abstract:
Predictive analytics relies on processing, analyzing data from many different sources, collating, and then processing that through several stages into usable data. This involves recording and storing data in different formats, and may require translating information into PMML. Despite the complexities and structure of the information, and the sources often involving data from traditional RDBMS data sources, other solutions offer some advantages. We can use the recent range of document-based NoSQL databases to help collate the information in a structured format, while coping with the flexible structure of the individual data points. Many NoSQL environments also provide support for extensive map reduce type queries and processing that makes them ideal for processing large volumes of data into a summary format. In this article, we’ll look at the transfer, exchange, and formatting of information in NoSQL environments.
Read Document databases in predictive modeling