Data Migration: Database Terms and Structures

In the previous post we looked at a number of different database types and solutions, and it should be clear that there are a huge range of different terms for the different entities that make up the database structure. All the different entities fit into one of four categories, and they have significance because when moving and migrating data you need to know the source and destination type and whether you should be creating a database for every document (bad) or a document for every record (good). The components can be described as shown in Figure 1-6.

Figure 1-6.png

Figure 1-6: Database Terms and Structures

Most databases support the notion of four different components:

  • Field – generally the smallest piece of addressable data within any database. However, not all databases identify information down to the field level. Others don’t even recognise fields at all.
  • Record – a group of fields, or, a single block of identifiable information. For example, your contact information is a record made of the fields that define your name, your address, and your email address. Some databases only support the notion of a block of information and don’t care what it contains, whether that is fields or a binary string of data. Records may also involve either a fixed set of fields, or a variable group.
  • Table – a group of records. Some databases assign a specific group of fields to a specific table. Others just use a table to hold or identify a collection of records with largely similar information. Some database types, such as NoSQL, do not support a table, but immediately jump from record to database.
  • Database – a group of tables. Not all databases support this additional level of organisation, and in fact it tends to be those that have a significant structure at the lower levels (field, record). The database is usually used in the role of multi-tenancy, that is, the ability to store a collection of data related to a single application.

Of course, the problem is that different databases apply and support these terms differently, many use different terms, and some may blur the lines between each term to such an extent that it is impossible to tell where the different elements exist.

Let’s explain this a little further by providing some explicit examples:

  • MySQL, Oracle database, IBM DB2, Microsoft SQL Server, Microsoft Access, and other relational databases tend to support all four levels with a very rigid structure in place, as you would expect from a structured RDBMS.
  • Memcached knows only records (values) identified by a supplied key, and those records have no fields.
  • CouchDB, MongoDB and Couchbase support different databases, and within those databases you have documents, which are logically similar to records. These documents have fields, but there is no requirement for the fields within each document to be the same from document to document. MongoDB also supports collections, which are akin to tables.
  • Hadoop in it’s bare Highly Distributed File System (HDFS) native structure doesn’t understand anything, although you can place files into different directories to mimic a structure. If you use a system on top of HDFS, such as Hive, HBase or Impala, you are normally implying a typical 4-level data architecture.

In general, the ability to identify different components within the database depends on the database type, and a summary of these is provided in the table below.

Database Fields Records Tables Databases
RDBMS Yes Yes Yes Yes
NewSQL Yes Yes Yes Yes
NoSQL Mostly Documents/Rows Maybe Yes
Key/Value Stores No Yes, by ID No Maybe
Unstructured No No No Maybe

Now let’s have a look at the specific example database solutions, including the term used for the corresponding value:

Database Type Database Fields Records Tables Databases
RDBMS Oracle Yes Yes Yes Yes
MySQL Yes Yes Yes Yes
PostgreSQL Yes Yes Yes Yes
NewSQL InfiniDB Yes Yes Yes Yes
TokuDB Yes Yes Yes Yes
NoSQL CouchDB Yes, embedded in JSON Documents No Yes
Couchbase Yes, embedded in JSON Documents No Buckets
MongoDB Yes, embedded in BSON Documents Collections Yes
Cassandra Implied in column family Yes, implied by key ID Implied in Column Family No
HBase Implied in columns Yes Implied in Column Family No
Key/Value Memcached No Yes, by key ID No Maybe
Redis Yes Yes, key/value pair No No
Riak Yes Yes Schema No
Unstructured Hadoop/HDFS No No No By HDFS directory
Hive Yes, if implied Yes, if implied Yes Yes


Although it wont be covered in this series to any significant degree, these different levels also tend to support one further distinction, and that is security. Different database solutions provide security at a variety of levels and some allow you to restrict access down to the record level. For all database systems where different databases are supported and their is some level of security or protection between them, these databases are called multi tenant databases.

As we start moving the data between databases, understanding the importance of these elements is critical. For example, when moving data from an RDBMS to Hadoop, the distinction of table or database may disappear, and the significance of individual records may be deliberately removed entirely to enable the information to be processed effectively.

In contrast, moving data from MongoDB into MySQL is easier because we can identify specific elements such as a database and a table. Where we start to become unstuck is that although documents contain a collection of fields, they may not contain the same fields across each document.

Homogeneous vs. Heterogeneous

The primary issue with exchanging information is whether you are moving data between homogeneous or heterogeneous databases. Homogeneous databases are those that are of the same type, for example, moving data from Oracle to MySQL; both are RDBMSs, both have databases, tables, records and fields, and therefore the complexity of moving data between the database is straightforward from a structural perspective. But the datatypes supported are not the same. What do you do about CLOB or RAW datatypes in Oracle when migrated to MySQL?

In a similar vein, the actual procedural process of moving data between database types is similarly affected. MongoDB and Couchbase, for example, support the same structure; JSON and BSON are largely identical, and although there are some differences, reading the data from MongoDB and writing it to Couchbase can be achieved with functions that are almost identical – get the document by it’s ID on MongoDB and set the document on Couchbase with the same ID.

Most RDBMSs can be accessed through SQL and front-ends like JDBC or ODBC, opening two connections and reading/writing are easy to do. Most support the SELECT INTO and LOAD DATA INFILE style SQL to export and import data in larger chunks. But in heterogeneous deployments the same tools are not always available. A quick, but not always accurate, description of these elements across different databases is shown in this table.

Issue Homogeneous Heterogeneous
Data structure No Yes
Data types Yes Yes
Data Loading No Yes
Data Usability Yes Yes

Defining the Problem

Now that we have a good grasp of the different databases, their abilities, and their differences, it is time to take a closer look at what we mean by moving and migrating data and the problems associated with this kind of operation. Now we can finally start to define the problem of exchanging data between different databases and how that process can be tackled and resolved.

All of the following aspects must be considered in entirety before you start to exchange data, but think about it logically and holistically – you have to decide how data will be formatted, how the data is going to look (structure), how the data physically going to be transferred, and finally how it is going to be used.

Altering the Format

All data is not created the same, or in the same format, and furthermore, not all data is supported or acknowledged. Within NoSQL, for example, there may be no datatypes other than string, so you need to consider how you are going to move the data to the right type and the right format without (unnecessarily) losing data. The main considerations are:

Differences in supported types – you may have to choose between migrating to the next nearest, or most appropriate type. NoSQL and all Big Data targets tend not to have strong datatypes, whereas RDBMS database have very strong typing. You must choose a type that is able to handle the data in the way you want, and be able to hold the size of the information being inserted. Large text data, for example, may be too long to fit in a CHAR or VARCHAR column, and may need to be inserted into a BLOB or RAW column.

Differences in type definitions – databases have different definitions of different types. For example, Amazon RedShift supports only 19 digits of precision for floating-point values, while MySQL supports up to 53. Dates and times are also typically represented different, with some only supporting an explicit date type, or supporting a combined date time, or supporting a time with heavily restricted precision. All these differences mean that you may wish to store values outside the given range as a different type; for example, storing dates or timestamps-point values and dates as strings so as not to lose data.

Differences in type interpretation – generally a difficult problem to resolve without extensive testing, some datatypes can be interpreted incorrectly when the data is moved into a target database. String encoding – for example ASCII and Unicode, or bit-specific fields can cause issues. Also timestamps which may be interpreted during import as being subject to time differences; for example, if you exported on a server using Pacific Standard Time (PST) but imported on a different database using Central European Standard Time (CEST).

These issues must be considered in entirety before you exchange data; getting it wrong could lead to incorrect, invalid, and even completely corrupt information.

Altering the Structure

It should be clear right now that there are differences in the structure of the different database types. What may not be clear is that there are more options available to you than a simple direct association from one type to another. Instead you must make sure that the data is exchanged in an effective manner appropriate the information that is being exchanged.

For certain combinations the structure may appear obvious, but there is always the possibility that you the structure and information can be more effectively organised. For example, when moving from an RDBMS to a document store, the first intention is simply to place the different tables and structure them as different documents within the target database. This is fine, but adds complications you may want to avoid when you come to use it. Instead, merging the different tables into one larger document with nested components may simplify the use of the data in the target application.

The same can be true in reverse, exploding a single document into multiple, related, tables. Alternatively, you may want to take advantage of specific functionality in the RDBMS, such as XML fields, sets, enums or even convert the information to embedded JSON or serialised language variables if that makes sense to your application.

Loading the Information

Physically transferring the information seems like the most mundane of the processes in the entire scheme of exchanging data between systems, but in actual fact, it is is less clear than you might think. We’ll look at this in more detail when examining specific examples and database exchange projects, but some upfront issues to consider:

Does the solution include a native bulk loading system. Some databases specifically support a method of importing data, whether larger or small. For example, in MySQL the LOAD DATA INFILE SQL statement can do this for you. Cassandra supports a COPY command in CQL, and various Hadoop interfaces such as HBase and Hive enable you to access CSV files directly without explicitly importing them.

Custom loading may be required if no built-in solution exists. This can take many forms, including writing your own, or if they are available using specialised tools like Tungsten Replicator or  Sqoop. The exact method is going to depend on the data exchange type, data size, and complexity of the load process.

Application loading can be used in those situations where the application is running and a different version or format of the information is used. For example, when caching with a NoSQL engine on top of an RDBMS, you might adapt your application to automatically generate the NoSQL record. Similarly, during a migration, you might configure your application to look in the new database, and if it doesn’t exist, load it from the old database and generate the new record.

Data sizes must be a consideration. It seems ridiculous in this age when disk sizes are so large, but database sizes can be huge too. A recent project I was involved in required migrating just under 150TB of information. Storing all of that data in one go would have a required a mammoth sized disk array before the data was loaded into a Hadoop/Hive database. There are solutions for moving and migrating such large volumes of data without it ever touching the disk and using up all that space.

Depending on your data exchange requirements, any, or all of these may be an issue you have to contend with.

Making the Data Usable

Exchanging data between systems is only any good if once there the data is usable. Nobody would consider releasing a physical book in the USA, and a digital book in France, and not translating it. The same is true of data. Exchanging the data between databases requires you to take these issues into account during the movement of the data; it’s no good just blindly copying the data over and hoping it will be usable.

To make the data usable the following aspects must be considered:

  • Data accessibility – we’ve already talked about the key structural translation that needs to take place, but you also need to think about the effect on elements such as searching and indexing. Certain indexing methods are more complex (and therefore computationally expensive) than others. Some are more efficient. Some database environments support a limited number, quantity or complexity of indexing and querying that can only be addressed if the format and structure of the data is correct to begin with.
  • Data validity – if you change the structure of the data, does that change the ability to validate or otherwise ensure the quality of the information? For example, moving from RDBMS to NoSQL you may lose the ability to single out duplicate entries for certain types and fragments of the dataset. Relational constraints are not enforced within non-relational databases. Data format differences may also present problems; in a NoSQL database, for example, the same strict database types, such as dates, times or numbers do not exist. How do you prevent an invalid date being inserted into a date column, or worse, a non-date value into a date column that would have been identified during a database write?
  • Application usability – if the data is moved, can you still access and update it in the same way? RDBMSs tend to be transactional, providing stability and support, NoSQL databases do not as a rule, particularly across multiple silos. If an invoice is updated, how do I guarantee that the customers account is also updated, especially if one operation, or the database itself, fails during the process?

These are some, but not all, of the issues you need to be aware of. Regardless of the actual method though, you want to actually use the data at the end, so don’t forget how you might query or index the data once it’s moved. Keep in mind that not all data moves require heavy consideration. If you are exporting the data to be loaded for a mail merge for example, the usability aspects may be minor compared to the format and quality of the information.

How to Buffer posts+hashtags from your Blog using Zapier

I try to automate as much my life as possible, particularly when it comes to computers.

I’ve been using the automated ‘Social Sharing’ on (and indeed, my blogs in general) for years. However, I’m also a keen Buffer user and does not offer a Buffer connection. Because I also use Buffer to handle my Patreon posts, concentrating them all in one place would make things a lot easier.

What I wanted to do was something quite straightforward, I wanted to turn a blog post entry into post to Twitter (and others) that turned the list of tags I created on the post into #hashtags. This actually doesn’t seem like a particularly complex or uncommon request, but apparently it’s not a standard offering. What I was even more surprised at was that nobody else seemed to have done the same, which has me confused…

Now there are many options for doing this kind of automated posting, I could have used IFTTT, but IFTTT while incredibly useful (I have about 60 recipes on there) is also incredibly simplistic and your options are limited. That means I can’t post from WordPress to Buffer with the required hashtags.

Zapier is very similar to IFTTT, but also has the option of running multistep Zaps that do more than one thing (IFTTT is limited to one target), but better than that you can include a step that runs information through a JavaScript (or Python) script to do some additional processing.

And this is the key that enables me to do precisely what I need, take a blog post from one of my blogs, process the list of tags into a list of (de-duplicated) hashtags, and then post it into my Buffer queues.

So, here’s how to get Zapier to do what you need, there are going to be five steps to this:

  1. Identify when a new post appears on a WordPress blog
  2. Run a short Javascript program to take the list of tags (actually Terms) from the Blog post into a deduced and hash tagged version
  3. Add it to my Twitter Buffer
  4. Add it to my Facebook Buffer
  5. Add it to my LinkedIn Buffer

Here’s how to get it setup. I’m going to assume you know Zapier and can follow the onscreen instructions, it’s not that complex.

Step 1

  • Register for a Zapier account, if you don’t already have one.
  • Connect your Zapier account to your WordPress blog
  • Connect your Zapier account to your Buffer account

Step 2

Create a new Zap on Zapier.

Select ‘Wordpress’ as your trigger app.

Screenshot 2016-02-21 13.45.18.png

Now configure how you want the trigger to occur. I basically every post in every category, but if you want to add specific categories or other filtering, feel free.

Step 3

For the Action select ‘Code </>’

Screenshot 2016-02-21 13.45.28.png

Now Select ‘Javascript’

Screenshot 2016-02-21 13.45.34.png

When it gets to the Edit Template, you’ll need to specify the input variable to the JavaScript, in this case, create one called ‘tags’ and then select the ‘Terms Name’ from WordPress Step 1 and you’ll be ready to go.

Screenshot 2016-02-21 13.45.40.png

These variables that you select here are placed into a hash (associative array) in the JavaScript context called ‘input’, so in this case, we’ll have the item ‘input.tags’ to parse in our JavaScript code. The actual list of terms will come through as a comma-separated string

The code itself is quite straightforward:

var hashlist = {};

  var res = item.replace(/ /g,'');
  res = res.toLowerCase();
  res = '#' + res;
  hashlist[res] = 1;
return({'hashlist' : Object.keys(hashlist).join(' ')});

We iterate over the terms by using ‘split’ to separate by a comma, then we replace any spaces with nothing (so we turn things like ‘data migration’ to ‘datamigration’, convert it to lower case, add the # prefix and add that all to a new associative array. The reason for this is to get rid of duplicates, so even if we have ‘data migration’ and ‘datamigration’ in the input, we only get one in the output. This is particularly useful because the ‘Terms’ list from WordPress is actually composed of both the tags and the categories for each post.

Finally, we return all of that as a string with all the keys of the hash (ie. our nicely formatted tags) separated by a space. However, just like the input value, we return this as an Object with the string assigned to the field ‘hashlist’. We’ll need this when creating the Buffer post.

I recommend you test this thoroughly and make sure you check the output.

Step 4

Choose your target Buffer.

The Buffer API only allows you to post to one queue at a time, but brilliantly, Zapier lets us add multiple steps and so we can do one for each Buffer queue, in my case, the three. The benefit of this is that I can customise and tune the text and format for each. So, for example, I could omit the tags on Facebook, or, as I do, give a nice intro to the message ‘Please read my new blog post on…’ on FB because I’m not character (or attention span) limited.

Now for each Buffer queue, create your message, and when it comes to choosing the output, make sure you select your JavaScript output (which will be Step 2) and the ‘hashlist’ value.

Step 5

That’s, it! Test it, make sure your posts are appearing, and check your Buffer queue (deleting the entries if required so you don’t double-post items).

You can duplicate and use this as many times as you like, in fact I’ve done this across my two blogs and am now looking into where else I can use the same method.

Norway Arctic Cruise – Day 1: Boarding and Departure

When I get back to the Hurtigruten terminal, the same girl is on the reception desk.
She had mentioned when I originally checked in that I should return to the terminal before the boat left, when I explained that I pretty much understood the principle that if I was not on the boat when it left, I would miss the cruise… She hands over my envelope which contains tickets and my pass, and then I head up for the ‘safety briefing’.
There are a few other people waiting with me. The safety briefing has been highlighted multiple times, including the ominous ‘The safety briefing is mandatory and if you do not take it, you will not be allowed to to get on the boat!’ so I’m expecting something significant.
In fact, it’s a video, which says, quite simply, if you hear the alarm, get to the lifeboats, put on a life jacket, and get off the boat. The video goes through this entire statement in about 2 minutes, then spends 90 seconds repeating the same basic information all over again, including another example of the alarm we should listen out for. I dont know what I was expecting, but from the description and warnings I expected more than the common sense info they provided.
Once the briefing is over, we are taken to the ship. There is a nice covered walkway, but the MS Lofoten is so small, that we have to go down the steps and take the fire escape and walk out to the boat, and are introduced to a very familiar process.
The primary contents of the envelope contain your Hurtigruten key card, and each key card is unique to you and the voyage and boat you are taking. The keycard is important for a number of reasons:
  • It’s your room key. Dont forget it (although they will let you back in if you ask nicely!)
  • You can associate it, on board, with a credit or debit card. This means you can use the card each meal for drinks (if you haven’t prepaid for water or wine), at the cafe, and even the on-board souvenir shop.
  • You will need it every time you leave and enter the boat. You cannot leave without it, and you certainly wont be let back on board. It’s how they track whether all the passengers on the ship are on board before they leave a port.
It’s a brilliant system, not only from a safety perspective, but also convenience. If I had a complaint, it’s that I often forgot it when rushing out of my room because of the northern lights or other announcements.
Once on the ship, I start to take a look around and get my bearings around the decks before ultimately finding my room. My suitcases have been placed outside of my room, and I get the bags in and then do some very brief unpacking so I can hang up my shirts and put stuff away as good as I can. My room is, compact, about what you’d expect, with two fold out beds and a small desk, as well as a wet-room shower/toilet. I actually love small tiny spaces like this, so I’m perfectly happy.
Afterwards, I head back out to look around the deck and take some photos, before heading down to the restaurant for the buffet evening meal and start to meet the other passengers, although I eat alone as do many others, all slightly nervous of the other people on the cruise. This also introduces yet another standard operating procedure – before entering the restaurant you must use the hand sanitiser to prevent spreading infection around the boat.
After dinner I go and check why I do not have the coffee mug that will provide me with unlimited coffee for the journey (and also acts as a souvenir), and then we get an info blast from our ‘tour organiser’ Aesgir.
The ‘bar’ area at the aft of the ship
We are introduced to the senior members of the crew and then we are given instructions about life on board, including the importance of your key, and how things will work. A few interesting pieces of information come from that:
  • The phone in your room also provides announcements. Press the F11 button (which was already pressed on mine) and you will hear all general announcements. These start at 7am and finish at 10pm and announce departure and arrivals at different ports, interesting views while out, excursion availability and many other things.
  • The F12 button, if pressed, will work only between 10pm and 7am and will provide notification of northern lights if they are seen so that you can wake up and go see them. Obviously they are easier to view at night, so if you came expecting to see them, the alarm is vital.
  • The hand sanitiser is crucial, not only in the restaurant, but also when leaving and joining the boat, even if you only step off for a minute.
  • The MS Lofoten was built in 1967 and although refitted recently, still keeps a very traditional feel. The Hurtigruten line has been running for many years and still does the same basic route stopping at the different ports.
  • We also learn what will turn out to be a significant piece of information. There are only 43 passengers on board a cruise ship that can take 150, so we are a small, very tight knit group. In fact, only 43 of us went all the way to Kirkenes, just 23 made the entire northbound and southbound trip.
  • Finally, perhaps the most crucial piece of information of all, is that each night after dinner there will be an information sheet downstairs in the reception that will tell us all what is happening over the next 24 hours. It will contain a list of the stops, excursion, and activities, as well as any special sights we will see on the way past.
After the info dump, we still have two hours before the ship will leave Bergen. It’s dark outside, but the air is very crisp and I am loving the cold. It’s also surprisingly peaceful. I sit for a while out on one of the decks listening to some audiobooks and just enjoying the tranquility.
At 10:30, the last few people have arrived on the boat and we finally set sail. Although it is pitch black I do try my best to capture some photos of Bergen as we leave. Some of the decks are slippery. In fact, the main rear deck is entirely covered in an inch thick layer of ice. By 11pm I have had a long day, and decide to head to my cabin and bed.
My first problem is working out how to turn my sofa into a bed. I decide to sleep on the lower bunk and finally work out the catch and release mechanism and fall pretty much straight asleep.

Data Migration: Understanding the Challenges

Data migration – that is, the practice of sharing and distributing information between databases – requires some very careful consideration. Are you moving the data permanently, temporarily, sharing it between applications? Do want to share all of it, some of it? Are you changing databases, or trying to move some data to access or use the data in a more efficient system?

Let’s start by looking at what we mean by a database, and what the myriad of different databases are out there.


Walk up to any person at an IT conference or gathering twenty five years ago and ask them to name a database most would have probably selected one of a couple of the available tools at the time. All of the databases would have been the same type. That type would have been some kind of fixed record database management system, along the lines of dBase III+ or Oracle.

These had some very specific layouts and formats – the record would have had a fixed size, based on fixed fields, often with fixed widths. The reasons for this were largely for technical reasons – the way to store data efficiently was in records of a fixed size. Each record was made up of fields, each with a fixed size. To read a record, you needed the definition and then just extracted the bytes, as shown in Figure 1-1.

Figure 1-1.png

Figure 1-1: Fixed Record and Field Sizes

To access a different record, you could ‘seek’ ahead in the file according to the size of the records, and the number of the record you wanted to update. For example, to read record number 15 you would skip forward by physically reading the bytes from a file at 14 x RECORDSIZE.bytes, reading RECORDSIZE bytes, and then extracting the field data using the known record structure. This meant that records were treated as one, big, long block of bytes, as shown here in Figure 1-2.

Figure 1-2.png

Figure 1-2: Fixed Records as a stream of data

In fact, this was a very simple data model that was (and still is) thoroughly practical – many young developers and programmers may well have created a database using this very model. It even works if you use indexes – you can point directly to a record using the same system.

It may surprise you to know that for some databases this is still the fundamental model at the lower levels, although there may be some additional complexities and features. But over those same 25 years some other things have changed in two different directions, data formats, and data diversity. Those two have lead to a level of complexity in terms of the database systems that manage.

Although it may be useful to understand these low-level data formats about how the data is actually physically stored by the database, the focus of this series is one level higher. We want to consider how the data is structured, fields, records, documents, and also about the formatting and character structures and information, and finally how the entire database appears and is usable within your chosen database system. More importantly, we want to know how to move it all elsewhere. Before we get there, let’s look at the top level, database types.

Database Types

My earliest database – at age eight – was one that I built to catalogue my book collection using my Sinclair ZX81, with the software written entirely in BASIC. By the time I was 13 I had started to build custom applications using dBase III+ to manage my fathers accounts. When I left college, my first job was to move data, first from an old Digital Unix system to the new Sun Solaris 2 using the same database, and then from that database engine called BRS/Search, to Oracle. BRS/Search was a completely free-form database.

The aim of this process was to move that free-form store into a structured format – Oracle, an RDBMS – and to access it using a front-end built using a Macintosh specific RDBMS engine called 4th Dimension. In the background, we also started putting different classes of data into the then-brand-new Macintosh specific database called Filemaker.

Since those early days I’ve worked with (and on) PostgreSQL, MySQL, Oracle, Microsoft SQL Server, Microsoft Access, CouchDB, Berkeley DB, SQLite, Couchbase, MongoDB, Cassandra, DB2, and most recently Hadoop, to name just a few. They all have different characteristics – this is the primary reason they exist at all, in fact – and capturing the essential essence of each group of databases is our first step on the road to understanding how to move data between these databases.

The point here is not that I’ve got experience of (although hopefully that helps explain the reason and experience behind the content here), but instead, to demonstrate that there is a huge array of choice out there today. They all have different parameters, different methods of storing data, different supported formats, and a huge array of methods for reading, querying and extracting the information.

But what exactly moves a collection of data from just that – a string of bytes – into a database? And how does affect how we move data between them? Let’s look at some basic database principles. This will not be new information, but they are vital concepts to understand so that we can translate and refer to these elements through the rest of the series.

Database Principles

What is a database?

That is not an innocent question, and the answer depends entirely on the database system, type and individual solution before you can really provide an answer.

However, it can be summed up in two sentences:

A database enables the storage of individual, addressable blocks of information to be stored efficiently. These blocks can also be retrieved and potentially searched and indexed to enable the information to be effectively retrieved.

Whenever you look at a database and how to store, retrieve and update the information, you need to consider how the information within the database is accessed.

All databases share the same basic principles when it comes to working with the information itself, they must all share the following functionality referred to as CRUD; Create, Read, Update, Delete:

  • Create – data must be able to be created within the database, and this can be done on record or block basis, or in a batch mode where data is created in bulk.
  • Read – data must be able to be read back out. By their very nature, all databases must be able to do this on a selective basis, either by record, or by a group of records. More complex databases enable you to achieve this more selectively, for example, by selecting all of cars that are blue, or all the invoices raised for Acme Inc.
  • Update – data must be able to be updated. Again, as with reading, this must be possible on a record by record basis. Updates may also involve bulk modification of multiple records and even multiple fields simultaneously.
  • Delete – data must be deletable or removable on a record by record basis, involving either single or multiple records simultaneously.

Understanding the significance of these different operations within different databases is important to getting the movement and migration of information correct. Some databases can, by design, only support certain levels of these operations. Some provide implicit and explicit deletion of records, and others may deliberately not support update operations.

To further complicate matters, performance should always be a consideration for certain types of data migration. Most analytical and data warehouse platforms benefit from large, batched, or combined updates. Hadoop, for example, works badly with a large number of small files, because these cannot easily be distributed across the cluster. Hadoop is also, by design, an append-only system, which means updates are more complex to handle.

Contrast this with Memcached, where bulk writes or updates are supported, but where for reasons of cache efficiency you do not want large batches of data to be updated simultaneously as it would invalidate large portions of the cache.

Data Formats

Different databases store and structure information differently. Some use records, some use fields, some use documents. Some expect data to be highly structured, where a single ‘database’ may consist of tens, hundreds or even thousands of different tables for different pieces and types of information. At the opposite end of the scale, some just have a record with no further classification or identification.

These principles and how to migrate between them will be discussed throughout the series, but some general principles about the different structures and how to move between them will be examined in closer detail in a future post, when we look at Data Mapping and Transformations.


Depending on the database in use, different databases may use or enforce specific datatypes on the data that is stored. For example, there may be both character (string) and numeric datatypes.Although it is possible to store numeric information into a string column, there are often benefits to the numerical identity, including more efficient storage (and therefore faster operation), and the ability to run or perform specific operations, such as a SUM() or AVERAGE() function on a numeric column without having to translate each individual string into an integer or floating-point value.

Datatypes and their identification and translation are a major focus of a future post on  Data Mapping and Transformations.


All databases are predicated on the need to access the information within them very quickly. Consider a simple contact database with just 20 records in it. To look for the record with the name ‘MC Brown’ in it requires us to look at every record until we find the matching one. Of course, there may be more than one such record, so even if we find that the first record matches, we still have to iterate over 20 records to find all the matching entries.

With 20 records this isn’t a problem, with 20,000,000 records this is inefficient. Indexes bridge the gap by allowing the database to be addressed more efficiently. There are different algorithms for creating indexes that are beyond the scope of this text, but in all cases, the role of the index is to provide quicker access to information than could be achieved through a sequential sort.

Database Types

There are a myriad of different ways in which you can identify and classify different databases, and the dissection mechanism depends on what aspect of the database you are looking at. For example, SQL was for a long time associated exclusively with structured RDBMS engines, but has now become a data interface standard of it’s own and is used in both RDBMS and non-RDBMS environments. For the purposes of our understanding, we’ll examine them according to how they organise and classify their data.

Through the rest of this series, we concentrate on three major types, the RDBMS, NoSQL and Big Data.

Structured and Relational Database Management Systems (RDBMS)

Examples: Oracle, MySQL, PostgreSQL, Microsoft SQL Server, Microsoft Access, Filemaker Pro

Most structured database systems tend to have a relational database core (RDBMS), and most often, but not always, are interacted through the Structured Query Language (SQL). When talking to people about any databases, an RDBMS and SQL is what people will think of first, because it matches the idea of a strict database and types. The highly structured and rigid nature requires a rigid method of storing and retrieving information. It also places limitations and rigidity to your database types and structure. A simple layout is shown in Figure 1-3.

Figure 1-3.png

Figure 1-3: A structured RDBMS table diagram

Structured databases have a few specific characteristics:

  • Strict data structure – data is stored within fixed named silos (databases), within named tables, and with each table having a fixed number of named columns. Every single record within each table has the same number of fields (columns), and each column is used for a specific purpose or piece of information.
  • Strict data types – for example, an RDBMS will store integers and floats differently, and may have additional data types designed to provide fast access to specific information, for example, the SET and ENUM types within MySQL.
  • Data Definition Language (DDL) – related to the elements above, the DDL within any database is important because it provides a reference structure which can be used to replicate that structure in other database. Depending on the database system, the DDL may either be implicit in the way the data is accessed or stored, or in the API and interfaces provides, or the DDL could be more explicit, as in the dialects in SQL and similar statement-based interfaces.
  • Data manipulation language (DML) – Typically, but not always, SQL. The DML enables you to perform the correct CRUD operations to enable the information to be managed. Like DDL, the exact interface is very database specific. Some databases and systems rely entirely on a statement based language like SQL, which has it’s own dialects and structures for performing the updates. Others rely entirely on the API that interfaces between client applications and the database storage.
  • Relational capability – because the data is in a fixed format and with fixed types, it is possible to create specific relations between the field in one table with the field in other tables. This enables the data to be JOINed together to provide a unified output. For example, if you have orders and invoices, it’s possible to link the order and the invoice by a unique ID, and the database can either use or explicitly enforce the relationship. Joins are actually further characterised by their type, enabling many-to-one relationships (for example, multiple invoices relating to one client), one-to-many relationships (one invoice number referring to multiple invoice lines) and one-to-one (invoice to payment received).
  • Constraints and Indexes – constraints enable data to be created within a limited subset, or to identify rows uniquely. For example, a primary key constraint can force the table to create new records only with a new unique identifier. Indexes are used to create efficient methods for looking up and identifying data according to criteria. Within an RDBMS indexes are generally used to speed up access on a specific column, or multiple columns, to improve the speed of access during specific queries. Without an index, the RDBMS will default to performing a full table scan.

Structured/RDBMS solutions provide some of the easiest methods for exchanging data – it is generally easier to move data from a structure store to elsewhere. However, most destination databases do not have support the same range of indexes. Conversely, moving data from unstructured databases of any kind into Structured/RDBMS because you have to decide what goes where.

NewSQL Databases

Examples: Clustrix, VoltDB, InfiniDB, TokuDB

Traditional RDBMS and SQL databases are designed to run on a single machine. This has performance and hardware limitation issues. There is only so much memory and hard disk space that can be installed in a single machine, and if your database or performance requirements are high enough, a single server is not the solution. There are strategies, such as sharding the database (specifically splitting it up by an identifiable key, such as ID, name or geographical location), or more specifically dividing the database across machines, but these place a different load on your application layer, and are beyond the scope of this book.

NewSQL databases are a modification of the Structured/RDBMS that use multiple machines in a cluster to support the database requirements. Unlike the sharding and other methods, NewSQL solutions automatically distribute the load across the machines and handle the interface, indexing and querying required to access the data.

The main elements of the database and structure, such as databases, records and fields, and all other data migration considerations are the same as for traditional RDBMS environments.

NoSQL/Document Databases

Examples: Couchbase, CouchDB, MongoDB, Cassandra, HBase

NoSQL databases actually span a wide range of different databases, originally classified by their rejection of SQL as the DDL and DML language of choice, more usually resorting to the use of a direct API for accessing information. There was a resurgence of these different solutions in the early 2000s as people sought alternatives that were faster and simpler than the transactional RDBMS for web applications and websites.

Most NoSQL databases rely on simpler methods for accessing the information, for example by using a single document ID to retrieve a record of information. This document ID could be extracted from the users email address, so when a user logs in or register on a website, the document associated with that email address is accessed, rather than ‘looking-up’ the record in a larger table of user records.

NoSQL databases of this type can be roughly split into two groups, the columnar/tabular databases, and the document databases. The columnar/tabular type include Cassandra, Apache Hbase (part of Hadoop), and Google’s BigTable. Data is organised through an identifiable row ID, and a collection of associated column IDs that classify the data structure. They can look, and even act and operate in a similar fashion to the structured RDBMS table/row/column structure. A sample column style database (in this case Cassandra) looks roughly like that in Figure 1-4.

Figure 1-4.png

Figure 1-4: A columnar (Cassandra) database structure

Document databases are completely different. Unlike the table structure, data is instead organised into a document, usually using JSON or a JSON-like structure. Unlike the table structure, a document often combines different fragments of information together – for example, a contact record may store all the phone numbers, email addresses and other components within the single document for a given person. Documents, especially JSON based documents, are also very flexible and consist of fields that are nested, such as an array of phone numbers, or even entire nested structures, such as the individual rows (qty, product id, description, price) for an invoice or order, all encapsulated into a single document. A simple document database structure can be seen in Figure 1-5.

Figure 1-5.png

Figure 1-5: Document Databases

Perhaps most importantly, documents in a document database do not need to be identical. In a structured RDBMS environment, every record contains every field, even if the field is not actually used for that record. In a document database, different documents, even if within the same database or group may have only one field, or may have 20. The variable nature makes them appealing for this very reason, but represents an area of complexity when migrating information.

Most NoSQL systems have no idea of an explicit relation or join – this is often one of the aspects that makes the system faster. However, the lack of this element means that different techniques are required to store and interact with complex data.

Depending on the NoSQL solution, you may or may not have access to an index or quicker method of accessing the data. In CouchDB and Couchbase, for example, the fields of a document can be used to generate an index that provides quick searching and retrieval of information.

NoSQL databases can be easy to interact and migrate data to and from, providing there is (or isn’t) a strict schema, accordingly. For example, moving from an RDBMS to a document-based NoSQL database can be a case of converting the table records into documents identified by the primary key. It can also pay off in the long term to perform a more concerted conversion and translation of the source tables into unified documents.

Key/value (KV) Stores

Examples: Memcached, Redis, Riak

For most global declarations, key/value stores are treated as NoSQL, but I’ve split them out here because they have some interesting attributes that affect data exchange. A key/value store is exactly what it sounds like. A single blob of data (the value) is stored against a given key identifier. You store the information by giving the key, and retrieve the information by giving the same key. In most cases, the information can only be retrieved if you know the key. Iteration over the stored data, or indexes, are generally not available.

The roots of the key/value store go back to the attempt to speed up access to data where a given identifier is known, such as user id or email address. The best known key/value store is probably memcached which was originally developed to make use of the spare RAM of machines supporting a website (LiveJournal, a blogging platform) and enable fast access to blog entries. Since the ID of the blog could be derived from the URL being accessed, the entry could easily be looked up in memcached. If it didn’t exist, it was looked up from a MySQL database, and the formatted/retrieved version placed into the cache with the identifying URL.

Most document databases are really a modification of the key/value store. The value portion can be any data you like, from a simple string, through to a serialised object from C, Java or other languages, or a JSON document. In fact, some databases actually support both, and the only distinction between a key/value store and a document database is whether the database engine itself can identify and interact with the embedded structure. MongoDB and Couchbase, for example, have this distinction; MongoDB enables the database engine to update fields within the BSON (JSON-like) values, while Couchbase supports indexing of the JSON fields.

Key/Value stores are some of the harder databases to migrate and move data between. The lack of a structure, or the custom nature (for example a serialised language object), and the requirement to identify the record by a specific ID make exchanging data more complex.

Big Data (aka Unstructured, Semi-structured and Implied Structure Databases)

Examples: Hadoop, Apache Solr, ElasticSearch, Lucene

BRS/Search was, for the time and technology, relatively ground breaking in that it was a full-text retrieval system. Today we would probably classify this as a ‘document’ based database, that is, one that has a structured format, although the power behind BRS/Search was the ability to perform a free-text search across an entire collection.

Today, we generally referred to these types of database as unstructured, that is, there is no discernible format or structure to the information. Although there are many different examples of this, probably the best known today is Hadoop. Without getting into the functionality or history of Hadoop, the power of Hadoop comes from it’s ability to distribute the raw data and also to process and extract usable information from the unstructured data into something usable.

Within Hadoop, the normal workflow is to load Hadoop with raw data, for example, the text from tweets, or web-pages, and then use that information to build an index or data structure around the information so that it can be analysed or searched. Solutions such as Solr, Lucene and ElasticSearch work in similar ways, accessing the raw text and either indexing it so that the data can be indexed and searched, or using the structure that is available to provide searching and indexing by a more specific area.

This is an example where ‘semi-structured’ data applies. Twitter data for example consists of the twitter name, the tweet itself, and any tags or twitter users the tweet was directed to. The fixed fields and the tweet go together to make it semi-structured, as it consists of both structured and free-form information.

Implied structure databases are those where the structure of the data is implied by the database, even though the underlying data may only be partially structured and described. Apache Hive, part of Hadoop, is an example of this. Hive can natively read text files and interpret them with a specific structure, converting CSV files into columns so that they can be queried by HiveQL, a simplified form of SQL. Hive can also parse more complex data, including CSV that embeds JSON and serialised data structures, all so they can be queried through a familiar interface.

However, unlike a true RDBMS, Hive only interprets the underlying format, and it performs this interpretation every time the data is accessed. At no time does the data have to be translated into Hive format (nor, really, is there one), and no indexes are created to enable quick access to the data.

All of these individual types are wrapped up into what I’ve classed as ‘Big Data’. This is not to say that the data needs to be of specific size or complexity, only that it may consist of structured, unstructured, or all variants in between.

Moving data to and from unstructured, semi-structured, and implied structure databases entirely depends on what the information is, what structure is available, and how that structure can be used (or ignored) accordingly.

Norway Arctic Cruise – Day 1: London to Bergen

Sunday morning, 6am, and I’m leaving the hotel in a taxi. The ground outside is frosted up so much and the taxi driver mentions to me how cold it is and hopes I’m going to a warm country. I think he’s shocked to realise I’m going to the arctic circle.
I booked the flight through Hurtigruten, and it’s a scheduled British Airways flight that is surprisingly busy. The plane journey is routine, but we get to see some lovely little islands and rocks as we come in to land in Bergen.
The airport has the small airport feel as you get off and they check your passport almost just inside the door. I get my bearings after collecting my luggage and then head outside to the coach for the transfer to the Hurtigruten terminal. I’m the first to the coach, with a very friendly driver.
In fact, I’m the only one on the coach at all. So I spend the 30 minute journey from the airport to the terminal talking to the bus driver about economics, immigration and Thailand. Amazing how deep and detailed you can get in the few minutes of travel.
Once at the Terminal we say our goodbyes and then there is a short wait while another cruise is assembling and the Hurtigruten desk opens. A lovely takes my details and checks in my bags so I am able to go for a walk. The boat does not leave until 10pm, so I have a full day to spend taking in the sites.
The terminal location is somewhat industrial, so I start walking towards what I hope is the old town. Everything is shut. And I mean really shut. At first I walk up and over the a hill that overlooks the terminal and sit down from another woman who is obviously doing her Sunday morning ritual and enjoying the view, while a girl in the distance plays with a dog. I can see a boat at the terminal, but it’s obviously not the MS Lofoten.
I realise, after checking my location, that I’ve got a way to walk, this time down a steep hill until I reach the quayside opposite Bryggen, the old medieval quays of Bergen. The old buildings are colourful and very recognisable and are much better viewed from the other side of the port so that you can take in the full vista. This is not where I’m headed, I want to get up to the top of the nearby mountain.
I take a walk around the port, passing the fish market and an impressive array of fish and shellfish. As I round the corner I’m tempted by a few of the restaurants but I know there is one at the top of Fløibanen, one of the seven mountains that surround Bergen. The Funicular starts at the bottom of the hill close to the main port – literally a few hundred years from both Bryggen and the fish market. The roundtrip ticket (up and back down again) is 90NOK, so about £7 or $10.
The hill is steep – the funicular itself is stepped inside so that you have a variety of different levels to stand on, which not only makes because of the physical layout of the carriage, but also means every single level gets access to the same unrestricted view. The journey is quiet and takes a few minutes to get the side of the mountain, passing the downhill carriage in the process. The views quickly became amazing, although a good chunk of the route that still lies within the city is actually walled and covered,  suspect to provide some barrier for the people who live and work in the building, The funicular runs every 15 minutes and although it’s silent is probably quite a distraction. Not to mention tourists looking into gardens and through windows.
At the top as we exit you are presented with a massive stepped viewing area that looks out directly over Bergen and the fjords beyond, and from where you can see the other mountains and surrounding landscape. To see the views are magnificent is an understatement. It is a beautiful crisp, blue skied autumn day which makes the view and atmosphere perfect. It takes a while to sink in just how majestic the landscape is.
I head for some lunch in the cafe at the top – it’s self service and the food is excellent and traditional fair, although it takes me some time to get used to the currency as I’m trying not to end up with so many coins. It is very busy, and this is obviously the place to be for locals and tourists. I have some trouble finding a table and eventually have to share with a group of Norwegian having a deep conversation I obviously cannot follow.
After lunch I decide to go for a walk with the hoards of locals, entire families, friends, out for a similar walk in the afternoon sun. There are people running, hiking or just out for a stroll. The top of Fløibanen is covered in pathways that you can follow without needing to be too fit, and also off-track routes.
There are also a number of lakes and ponds, most with picnic areas and even barbecues if you’ve come prepared.
After years of walking in the lake district on paths and crags that are very similar I take a route off the beaten track and switch into what I can only call ‘mountain goat’ mode, stepping quickly from rock to rock and crag to crag as I go both up and down the hills and terrain. This is what I’ve missed – it’s been a long time since I was able to go on a hike that felt so wild and rough and I’m loving it.
Still the views are magnificent and I find myself stopping frequently just to take in the views and breathe in the very fresh and clean air.
There are various wooden carvings around the forest at the top, and near one I spy a family and their children walking out over the ice. Although I wouldn’t like to try it, I’ve already seen that the ice in places is several inches thick so I’m not worried.
After trekking for a few hours, both on and off track, I head back to the steps and start to look out at the setting sun. It will set at 15:55 and I make it back to the view just after 3pm. I watch as the sun slowly sets – takes a long time due to combination of the northern location (we are 60º north here), the time of the year, and the elevation, but the view is too magnificent to miss.
I stay until just before the sun goes down, and then take the funicular down the mountain and walk back to the Hurtigruten terminal in the twilight.

Keeping the Fiction Flowing

As I announced in my last blog post, I’m starting to publish all of my fiction through Patreon.

There are two books I’m actively publishing right now:

  • NAPE – a sci-fi story featuring a missing artificial intelligence.
  • Kings Courier – a fantasy featuring a boy who is a courier and gets pulled into a deeper role than he ever expected.

In case you’ve already missed the previous instalments:


Kings Courier

Of course, you can head to Patreon and sign up for the regular updates over there.


Norway Arctic Cruise – Day 0: Preparation

I cannot remember precisely what point I wanted to go on a cruise to the arctic, but it was in my teens, and for a while I obsessed about it. I actually remember walking into a travel agent in Northampton – where I went to college – and asking for some catalogues. By this time I’d already been wanting to go for some years.
There are probably a number of different factors around this:
  • I like the cold. I mean, I really really like the cold. My family calls me toastie. As we’ll see on the boat, I walked around outside in just a T-Shirt for nearly the entire trip.
  • I’m a huge Douglas Adams fan, and feel privileged to have actually met and spoken to the man himself. For those that do not know, Slartibartfast (from The Hitchhiker’s Guide to the Galaxy), designed the Fjords of Norway.
  • I also love Monty Python, and although Michael Palin sang about Finland, not Norway, I think the countryside, Fjords and other beautiful descriptions of the landscape suit Norway just as easily.
  • As a fantasy fan, having read Tolkein, Feist, Howard, Moorcock and many others, the rough and barren terrain holds some kind of draw for me.
  • I do like being on my own, and I fully expected to spend a lot of time on the boat not necessarily in my cabin, but out on the deck with only my thoughts to listen to while I marvelled at the landscape. Being out on a ship – if not in the middle of the ocean, distant enough to not be in constant contact, had always appealed. Things didn’t work out like that, but I wasn’t annoyed by that, in fact quite the opposite.
I’m sure there are other reasons not immediately obvious, but I think those are the main ones. When I finally got the opportunity, and more importantly, when I was reminded of my desire when talking to a friend about boats, I decided to take the leap and book the cruise.
The cruise was picked on the basis of two very specific requirements:
  • I wanted a small ship – I had already decided that a big ship would mean just a floating hotel that could be anywhere, and I would lose the feeling of being out on the ocean.
  • I wanted one that stopped at multiple places – Being on a ship that just goes fvrom one point to another did not appeal. I wanted something that stopped and visited different places, same or different countries didn’t matter, I just wanted to see more country than sea.
This ultimately led me to book Hurtigruten, who use working ships on their route up and down the coast of Norway, and the MS Lofoten, which is the smallest of their ships.
I booked the cruise actually while on another trip. But I decided that I didn’t just want to do the cruise, I wanted to do as many excursions as possible so that I could experience many different things. So that meant picking things like the husky sledding, the aquarium, and trips to different locations and points. If I was going to arctic, I wanted to go to places that made it clear it was the arctic.
But it didn’t feel real until I’d stopped travelling for work and then gotten the tickets in the mail.
Once picked, now it was time to prepare for the journey. I have a lot of walking gear, but not a lot that would help me in potentially sub-zero temperatures, a lot of water, or a lot of snow.
Now I have been a big fan of Rohan for years, in fact, I’ve kept to three basic rules for my equipment for some time. I put Salomon’s on my feet, a Suunto on my wrist, and Rohan for everything else. So a trip to my local Rohan shop in York was in order.
I must say that Charlie, Max and Gary were ever so helpful and patient, although I’m pretty sure that they do not get many people  going with a list and the statement that they are going on a trek to Norway! Charlie in particular was so patient, and very kindly placed all of the clothing into our already-owned Rohan bags.
So for all of the effort, what do the piles contain:
  • Some new walking trousers that are more waterproof than my normal ones.
  • Some comfortable trousers for when I’m not on an excursion
  • Some waterproof over trousers
  • A nice super-warm padded jacket
  • Some thermals
  • Gloves (both thin and super insulating), hat
  • A few more bags and waterproof phone/iPad holder
All absolutely vital, and for this trip, by far the best selection of travel gear I’ve ever owned. We’ll see how practical it became on the cruise in due course.
What else did I prepare for?
  • I got a GoPro Hero 4, so I could record both the views (it’s got a fantastic wide angle lens) and the husky trip (which the assistance of a chest-mount)
  • Charge up both my other cameras – I knew in the cold they would need help, so I also packed and charged many secondary-batteries
  • Prepared a list of things I wanted to do on the cruise, like go through my reading list, sort out my photos, etc. It’s not that I thought I might be bored, I just like to make sure I’m prepared to do things I like while I’m away.
  • Packed my iPad to the gills with books, Spotify lists of music I haven’t listened to yet, and a few movies in case I was super-bored.
That was it, I was ready!
Packing proved to be problematic – I had just too much photo gear to fit into my usual carry on bag, so I ended up taking a suitcase in the cabin and checking a suitcase into the hold to carry it all.
Because of my early flight to Bergen on the Sunday morning (7am) I decided to stay at a local hotel on the Saturday night, which gave me the opportunity to settle in and relax a little before what would turn out to be a busy day.

2015 is that way, 2016 is this way

The last year has been something of a change in direction in my life. Not only was it a year of a large number of ‘firsts’ for me, in all sorts of ways, I also changed a lot of what I was doing to better suit me. Actually that’s really important.

2015 turned out to be a really significant year for me, not because of any huge life changes, but because so many different and interesting things happened to me

What did I change?

‘Official’ Studying – I have for many years been doing a degree in Psychology with the Open University. I was actually on my last year – well, 20 months as it was part time. I had my final two modules to go, and although I was hugely enjoying the course, it was a major sap on my personal time; what little I have of it after work and other obligations (see below). I also reached a crunch point; due to the way the course worked, changes in the rules, and the duration of the work (I started studying back in 2007), I had to finish the course by June 2016, and that meant there were no opportunities for retakes or doing the entire course all over again. I either had to get it right, first time, for each remaining course, or I would have to start again. That kept the pressure on me to get good marks massive when I have a very busy day job, and it got harder to dedicate the required time. In the end I decided that having the piece of paper was less important than having the personal interest in the topic. And that was the other of problem. I’d already stopped reading, I stopped playing games, I stopped going out, all to complete a course. I realised that my interest in Psychology wont disappear just because I stop studying. I can still read the books, magazines, articles that interest me without feeling pressured to do so.

Book/Article Writing – Given the above, the lack of activity on here, it wont surprise you that writing books and articles was something else I stopped. I deliberately changed my focus to the Psychology degree. But I also stopped doing anything outside work in any of the areas I’m interested in, despite some offers. I was working on a book, actually two books, but ultimately dropped them due to other pressures. Hopefully I’ll be converting some of that material into posts here over the course of the year.

Working Hours – I have very strange sleep patterns; I sleep very little, and have done since the day I was born. As such that means I normally get up very early (2am is not unusual) having gone to bed at 10 or 11pm the previous night. However, last I spent even more time up late on the phone with meetings and phone calls to people in California. That would make for a long day, so I switched my day entirely so that I now start working later and finish later, doing most of my personal stuff in the early morning. It’s nice and quiet then to.

2015 Firsts

  • First time staying in a B&B – I know, this seems like an odd, but I have honestly never stayed in a B&B before. But I did, three times, while on a wonderful touring holiday of the North of Scotland, taking in Inverness, Skye, Loch Ness and many other places.
  • First touring holiday (road trip) – See above. For the first time ever, I didn’t go to one place, stay there, and travel around the area. We drove miles. In fact, I did about 2,800 over the course of a week.
  • First time to the very north of Scotland – Part of the same road trip. I’ve done Dunbar, North Berwick, the borders, Edinburgh.
  • First music concert (in ages) – I went to two, in fact. One in Malaga and one in San Francisco about two weeks later. Enjoyed both. Want to do more.
  • First time driving in the US – I’ve been regularly going to the US since 2003, when I first started working Microsoft, and even for companies in Silicon Valley, I’ve always taken rides from friends, or taxis. In April, I hired a car and drove around. A lot. I did about 600 miles over the course of two weeks.
  • First Spanish train journey – I flew to Madrid on business, and then took the train from there down to see a friend in Malaga. The AVE train is lovely, and a beautiful way to travel, especially at 302km/h.
  • First Cruise – I’ve wanted to go on a cruise to see the Fjords of Norway since I was a teenager. I love the cold, I love the idea of being relatively isolated on a boat with lots of time to myself. In the end, I spent way more time interacting with other people than I expected, and did so little on my own, but I wouldn’t have changed it for the world. I went from Bergen to Kirkenes in the Arctic circle and back on the Hurtigruten and it was one of the most amazing trips of my life.
  • First time travelling on my own not for business – I travel so much for work (I did 16 journeys in 2015, most to California) it made a nice, if weird, change to do s full trip on my own. I enjoyed it immensely and recommend it to everybody.

What’s planned for 2016?

I’m starting to publish my fictional work on Patreon with the express intention of getting book content that I’ve been working on for many many years out there in front of other people. I’ve got detailed notes and outlines on about nine different fictional titles, crossing a range of different genres. I’ve started with two of my larger ‘worlds’ – NAPE and Kings Courier and will be following up with regular chapters and content over the coming months.

I’ve also created a new blog to capture all of my travel. Not the work stuff, but things like the Scotland tour and the Norwegian Cruise, plus whatever else comes up this year and beyond. Current thoughts are Antartica, Alaska or Iceland, work and personal commitments permitting. Plus I’m in Spain in August with my family and friends.

Converting my unfinished technical books to blog posts. I’ve worked on a number of books, some of which contain fresh, brand new material I’d like to share with other people, including the book content I was working on last year. I’m still trying to reformat it for the blog so that it looks good, but I will get there.

Process home monitoring data using the Time Series Database in Bluemix

I keep a lot of information about my house – I have had sensors and recording units in various parts of my house years, recording info through a variety of different devices.

Over the years I’ve built a number of different solutions for storing and displaying the information, and when the opportunity came up to write about a database built specifically for recording this information I jumped at the change, and this is what I came up with:

As home automation increases, so does the number of sensors recording statistics and information needed to feed that data. Using the Time Series Database in BlueMix makes it easy to record the time-logged data and query and report on it. In this tutorial, we’ll examine how to create, store, and, ultimately, report on information by using the Time Series Database. We’ll also use the database to correlate data points across multiple sensors to track the effectiveness of heating systems in a multi-zone house.

You can read the full article here

Office 365 Activation Wont Accept Password

So today I signed up for Office 365, since it seemed to be the easiest way to get hold of Office; although I have a license and subscription, I also have more machines.

To say I was frustrated when I tried to activate Office 365 was an understatement. Each time I went through the process, it would reject the password saying there was a problem with my account.

I could login with my email and password online, but through the activation, no dice. Some internet searches, including with the ludicrously bad Windows support search didn’t elicit anything useful.

Then it hit me. Office 2011 for Mac through an Office 365 subscription probably doesn’t know about secondary authentication.

Sure enough, I created and application specific password, logged in with that, and yay, I now have a running Office 365 subscription.

If you are experiencing the same problem, using a application specific password might just help you out.