Webinar Wednesday: Moving data in real-time into Elasticsearch

Elasticsearch provides a quick and easy method to aggregate data, whether you want to use it for simplifying your search across multiple depots and databases, or as part of your analytics stack. Getting the data from your transactional engines into Elasticsearch is something that can be achieved within your application layer with all of the associated development and maintenance costs. Instead, offload the operation and simplify your deployment by using direct data replication to handle the insert, update and delete processes.

In this webinar, we will examine

  • Basic replication model
  • How to concentrate data from multiple sources
  • How the data is represented within Elasticsearch
  • Customizations and configurations available to tailor the data format
  • Filters and data modifications available

Register now!

Continue reading Webinar Wednesday: Moving data in real-time into Elasticsearch

Analytical Replication Performance from MySQL to Vertica on MemCloud

I’ve recently been trying to improve the performance of the Vertica replicator, particularly in the form of the of the new single schema replication. We’ve done a lot in the new Tungsten Replicator 5.3.0 release to improve (and ultimately support) the new single schema model.

As part of that, I’ve also been personally looking to Kodiak MemCloud as a deployment platform. The people at Kodiak have been really helpful (disclaimer: I’ve worked with some of them in the past). MemCloud is a high-performance cloud platform that is based on hardware with high speed (and volume) RAM, SSD and fast Ethernet connections. This means that even without any adjustment and tuning you’ve got a fast platform to work on.

However, if you are willing to put in some extra time, you can tune things further. Once you have a super quick environment, you find you can tweak and update the settings a little more because you have more options available.  Ultimately you can then make use of that faster environment to stretch things a little bit further. And that’s exactly what I did when trying to determine how quickly I could get data into Vertica from MySQL.

In fact, the first time I ran my high-load test suite on MemCloud infrastructure, replicating data from MySQL into Vertica, I made this comment to my friend at Kodiak:

The whole thing went so quick I thought it hadn’t executed at all!

Continue reading Analytical Replication Performance from MySQL to Vertica on MemCloud

MacOS High Sierra Disk Space

Having updating to macOS High Sierra I’ve mostly been impressed, except about one thing, disk space usage.

The new APFS creates snapshots during local Time Machine backups, and this can mean, especially if you’be been dealing with some large files, that when you delete them and empty the wastebasket, you don’t get your disk space.

I noticed this because I’ve been on vacation and created some rather large files from all the photos I’ve been taking. Freeing up 200-400GB of disk space, and then realizing I no longer have the space available. On a 2TB disk, 400GB is a lot to lose. Even more so when you think you’ve deleted the files that *were* using up space.

The key is to look at your local backups, and the easiest method for that is the tmutil command. In particular, check the list of local snapshots:

$ tmutil listlocalsnapshots /

These are normally managed automatically using a combination of the date/age of the backup and the space they are using compared to how much disk space you need. All this happens automatically in the background for you.

But, if you’ve just done some house cleaning, or you’ve come back from using a lot of disk space and want to free it up, you’ll need to get rid of those old snapshots. You can do them individually using:

$ tmutil deletelocalsnapshots <snapshot_date>

But it’s easier to just purge the snapshots and specify how many you many want to get rid of. That will leave you with some recent snapshots but still recover some diskspace, for that, use this command:

$ tmutil thinlocalsnapshots <mount_point> [purgeamount] [urgency]

That [purgeamount] is how much space you want to recover in the process, and the [urgency] is a number (1-4) of how quickly you want the space recovered. Both are optional.

For me, I just ran the thinning and that left me with:

$ tmutil thinlocalsnapshots / 
Thinned local snapshots:

For me the difference was that I went from using 1.2TB on a 2TB disk from all those photos and the snapshots that went with them, to having 1.43TB free!




Keynote and Session at Percona Live Dublin 2017

On Sunday I will travel over to Dublin for Percona Live 2017.

I have two sessions, a keynote on the Wednesday morning where I’ll be talking about all the fun new stuff we have planned at Continuent and some new directions we’re working on.

I also have a more detailed session on our new appliers for Kafka, Elasticsearch and Cassandra, that’s Tuesday morning.


If you haven’t already booked to come along, feel free to use the discount code SeeMeSpeakPLE17 which will get you 15% off!