Nicolas Tobias has written an awesome guide to setting up replication from MariaDB to Hadoop/HDFS using Tungsten Replicator, in Spanish! He’s planning more of these so if you like what you see, please let him know!
Semana santa y yo con nuevas batallas que contar.
Me hayaba yo en el trabajo, pensando en que iba a invertir la calma que acompa;a a los dias de vacaciones que libremente podemos elegir trabajar y pense: No seria bueno terminar esa sincronizacion entre los servidores de mariaDB y HIVE?
Ya habia buscado algo de info al respecto en Enero hasta tenia una PoC montada con unas VM que volvi a encender, pero estaba todo podrido: no arrancaba, no funcionba ni siquiera me acordaba como lo habia hecho y el history de la shell er un galimatias. Decidi que si lo rehacia todo desde cero iba a poder dejarlo escrito en un playbook y ademas, aprenderlo y automatizarlo hasta el limite de poder desplegar de forma automatica on Ansible.
I’m sewing some quilts for charity this year for the Fawcett Society, who champion equal rights in the name of the Suffragette movement. This year (2018) is 100 years since the Vote for Women movement.
The first quilt is finished and will be going up for auction soon!
You can follow the progress on the main Fawcette page:
Or the Facebook group:
Continuent have been a long term sponsor of the Percona Live conference, and the MySQL conference as it was before that, for many years. We have attended the conference both as a Diamond sponsor, and members of our staff attending and presenting our products and experience at the conference.
The nature of these conferences always changes over time, and we have seen over the last few years how the Percona Live conference has moved from being a pure MySQL conference to an open source database conference. Although Continuent continue to provide open source software and integrate with many open source databases, our core operation still revolves around MySQL clustering and replication for MySQL and Oracle.
Continuent is also evolving and changing and we are increasingly deploying and moving towards pure cloud-based environments, building and developing products that are used on the cloud or explicitly leverage cloud computing technology. We have a number of new products and initiatives specifically targeting these areas.
Over the course of the next year we will be releasing cloud editions of our clustering, replication and new backup and proxy services both directly and through our partners.
As such, this year we have made the difficult decision not to sponsor or attend the Percona Live conference, directing our energies to other conferences, webinars and meetups. We will be attending the AWS conference, for example, and we fully intend to be at some other select conferences this year dealing with analytics, in-memory computing, and cloud-based deployments.
Elasticsearch provides a quick and easy method to aggregate data, whether you want to use it for simplifying your search across multiple depots and databases, or as part of your analytics stack. Getting the data from your transactional engines into Elasticsearch is something that can be achieved within your application layer with all of the associated development and maintenance costs. Instead, offload the operation and simplify your deployment by using direct data replication to handle the insert, update and delete processes.
In this webinar, we will examine
- Basic replication model
- How to concentrate data from multiple sources
- How the data is represented within Elasticsearch
- Customizations and configurations available to tailor the data format
- Filters and data modifications available
I’ve recently been trying to improve the performance of the Vertica replicator, particularly in the form of the of the new single schema replication. We’ve done a lot in the new Tungsten Replicator 5.3.0 release to improve (and ultimately support) the new single schema model.
As part of that, I’ve also been personally looking to Kodiak MemCloud as a deployment platform. The people at Kodiak have been really helpful (disclaimer: I’ve worked with some of them in the past). MemCloud is a high-performance cloud platform that is based on hardware with high speed (and volume) RAM, SSD and fast Ethernet connections. This means that even without any adjustment and tuning you’ve got a fast platform to work on.
However, if you are willing to put in some extra time, you can tune things further. Once you have a super quick environment, you find you can tweak and update the settings a little more because you have more options available. Ultimately you can then make use of that faster environment to stretch things a little bit further. And that’s exactly what I did when trying to determine how quickly I could get data into Vertica from MySQL.
In fact, the first time I ran my high-load test suite on MemCloud infrastructure, replicating data from MySQL into Vertica, I made this comment to my friend at Kodiak:
The whole thing went so quick I thought it hadn’t executed at all!
Having updating to macOS High Sierra I’ve mostly been impressed, except about one thing, disk space usage.
The new APFS creates snapshots during local Time Machine backups, and this can mean, especially if you’be been dealing with some large files, that when you delete them and empty the wastebasket, you don’t get your disk space.
I noticed this because I’ve been on vacation and created some rather large files from all the photos I’ve been taking. Freeing up 200-400GB of disk space, and then realizing I no longer have the space available. On a 2TB disk, 400GB is a lot to lose. Even more so when you think you’ve deleted the files that *were* using up space.
The key is to look at your local backups, and the easiest method for that is the tmutil command. In particular, check the list of local snapshots:
$ tmutil listlocalsnapshots / com.apple.TimeMachine.2017-10-06-163649 com.apple.TimeMachine.2017-10-07-065814 com.apple.TimeMachine.2017-10-11-165349 com.apple.TimeMachine.2017-10-11-19345 com.apple.TimeMachine.2017-10-11-203645 com.apple.TimeMachine.2017-10-12-003803 com.apple.TimeMachine.2017-10-12-124712
These are normally managed automatically using a combination of the date/age of the backup and the space they are using compared to how much disk space you need. All this happens automatically in the background for you.
But, if you’ve just done some house cleaning, or you’ve come back from using a lot of disk space and want to free it up, you’ll need to get rid of those old snapshots. You can do them individually using:
$ tmutil deletelocalsnapshots <snapshot_date>
But it’s easier to just purge the snapshots and specify how many you many want to get rid of. That will leave you with some recent snapshots but still recover some diskspace, for that, use this command:
$ tmutil thinlocalsnapshots <mount_point> [purgeamount] [urgency]
That [purgeamount] is how much space you want to recover in the process, and the [urgency] is a number (1-4) of how quickly you want the space recovered. Both are optional.
For me, I just ran the thinning and that left me with:
$ tmutil thinlocalsnapshots / Thinned local snapshots: 2017-10-06-163649 2017-10-07-065814
For me the difference was that I went from using 1.2TB on a 2TB disk from all those photos and the snapshots that went with them, to having 1.43TB free!
On Sunday I will travel over to Dublin for Percona Live 2017.
I have two sessions, a keynote on the Wednesday morning where I’ll be talking about all the fun new stuff we have planned at Continuent and some new directions we’re working on.
I also have a more detailed session on our new appliers for Kafka, Elasticsearch and Cassandra, that’s Tuesday morning.
If you haven’t already booked to come along, feel free to use the discount code SeeMeSpeakPLE17 which will get you 15% off!