Replicating multiple masters to one slave

As standard, MySQL allows replication from one master to multiple slaves, and that is a common scale-out scenario, but there have been a few comments recently, and some longer standing queries about having a setup that works the other way round, that is, multiple slaves replicating into a single master. This a common enough scenario in data logging systems, where the data is collected locally and then distributed up to a central database, or in EPOS (Electronic Point of Sale) systems where you want the transactions logs from the tills logged up to the database at head office. There are many other situations where you want that merging of information. Although MySQL doesn’t support what is called ‘multiple master, single slave’ solution, you can simulate the general approach by using a combination of replication and federated tables. Replication allows for different table types on the master (the source of the data) and the slave. There are many advantages to this, for example, using InnoDB on the master to take advantage of transactions, while using MyISAM on the slave for read performance. Federation allows you to access the tables of a remote server as if it were a local table. You can set up a federated table to access a remote table from as many machines as you like. That means that you can have two, or more, MySQL instances set up to use the remote table using the federated engine. You can execute any queries you like on the remote table, but you need to take care when using multiple hosts to access the remote table. Particularly when doing INSERT from multiple hosts, using InnoDB, Falcon, Maria or another table that supports multiple writers can be a good idea, although I’ll cover some workarounds for that later. Using federated gives us the ability to write to the same table from multiple hosts, but you dont want to read and write from the same remote table all the time, especially if on your local machine (your till, or data collector) you want to be able to run your own queries. This is where the replication fits in, if you set up replication from the master to another instance of MySQL, let’s call it ‘Fed Slave’ (which works both ways). On the Fed Slave, you configure the table or tables that you want to merge on the final ‘Slave’ machine to be federated tables. What happens is that data is replicated from the master to the Fed Slave, and on Fed Slave the queries are sent to the Merge Slave via federation. You can probably see this more clearly in the figure below.

Media_httpcoalfacemcs_eiprj

To re-iterate:

  1. INSERT on Master 1 is replicated to Fed Slave 1
  2. Fed Slave 1 executes the INSERT on a Federated table which points to Merge Slave
  3. Merge Slave executes the federated statement on its local table

Each Fed Slave is relatively lightweight – all it’s doing is executing a statement and sending the statement over the network to the Merge Slave, so you could run it on the same machine as Master 1. There are few problems with this design:

  1. Updating the same federated table from multiple hosts can get messy. There are a few ways you can get get round this, one is to stop the query execution on the slaves and only allow them to run during a set period of time. For example, let Fed Slave 1 execute the queries in the log from 1am to 2am, and Fed Slave 2 from 2am to 3am, and so on.
  2. Federation doesn’t get round the problems of duplicate IDs – if you try to run a statement on a federated table that inserts a duplicate ID it will fail just as will locally. You can get round this by making sure that the tables that hold the merge data on your Merge Slave dont have unique ID constraints, and that your Masters and all the table definitions contain a field to identify the source of the data in each case.
  3. Load can be an issue. One of the reasons I suggested InnoDB/Falcon/Maria is to help get round the multiple-insert and locking that is normally applied, but the very nature of the system means that locks and delays might still occur. You can’t eliminate it, but you can ease it.

I’ve tried and used this method in a number of situations, actually not for the reasons given above, but for performance logging from multiple hosts onto one. I’ll be honest and say that I’ve never seen a problem, but, at the same time, the type of data that I am collecting means that I would have been unlikely to notice a missing data point or two.

MySQL University – quick survey

MySQL University has been running for the last 18 months, and we’ve covered a wide range of topics, from the internals of MySQL right up to Amazon’s EC2, using MySQL in the Solaris/OpenSolaris Webstack and a description of the forthcoming MySQL Online Backup. Personally, I think they’re great. Obviously many times I am scribe and am there for the sessions, but I listen to lots of the sessions anyway, and I’m yet to be disappointed by the content. What’s really great is that in all the cases the person you are listening to is probably the person that either developed, or helped drive development of the particular function, or, in the case of some of the external tools (EC2, for example), these guys are expert in it. The experience is not quite as thrilling as attending the MySQL User Conference, but the content is just the same. The problem is that despite all the work we do to get the presenters, interesting topics, and promotion of the upcoming sessions, we don’t always get as many attendees as we want or expect. So, I’m wondering why this should be the case. We know that the current presentation system is not ideal (and we’re working on that), but I’m interested to hear people’s opinions on MySQL University. If you want to help shape the future of MySQL University, then comment here, and either answer the questions below, or make up your own.

  • Have you attended any MySQL University sessions. How many?
  • How would you rate the sessions generally? A simple good or bad will do
  • If you haven’t attended any sessions, or don’t regularly attend them, why not?
  • Have you ever looked at/listened to the past sessions that provide on MySQL Forge?

Please, I’m interested to hear.

MySQL on i5/OS

i5/OS doesn’t immediately strike you as the most natural environment for running MySQL, but in fact, there some advantages and benefits of making use of the hardware and i5/OS environment. The System i environment used with i5/OS is scalable, and the i5/OS itself provides lots of benefits over the control and separate of work. Obviously another key advantage is that if you are already using i5/OS for your application, then being able to plug in MySQL into that equation on the same machine makes a big difference. For those companies and organizations that already have a business application on their server, you can use MySQL in combination with ODBC or more direct interfaces such as PHP to provide a web interface to your business application all in the same box. MySQL works through PASE (Portable Application Solutions Environment) which allows AIX applications to run directly on i5/OS through a direct application binary interface. As a supported platform for MySQL 5.0 we obviously have instructions for installing MySQL into your i5/OS environment. Once installed, MySQL on i5/OS works just like any other MySQL installation. However, if you want a more complete view of the support, environment, and deployment of MySQL on i5/OS and more detailed instructions for setting PASE and your system to accept MySQL, then check out the IBM Redbook Discovering MySQL on IBM i5/OS.

Comparing 32-bit/64-bit MySQL on OpenSolaris

I’ve been working with the folks working on OpenSolaris for a few months now providing advice and input on getting MySQL and the connectors (C/ODBC and C/J) installed as a standard component. Having got the basics in, the team are now looking at adding both 32-bit and 64-bit packages. The question raised at the end of last week was whether OpenSolaris should enable 64-bit builds by default in 64-bit installations, and whether there was a noticeable performance difference that would make this worthwhile. I did some initial tests on Friday which showed that there was a small increase (10-15%) of the packaged 64-bit installations over 32-bit under x86 using snv_81. Tests were executed using the included sql-bench tool, and this was a single execution run of each package for 5.0.56. Transactions are missing because I hadn’t enabled transactions in the tests.

Test (x86, binary packages) 32-bit 64-bit +/-
ATIS 20 17 17.65%
alter-table 18 15 20.00%
big-tables 14 11 27.27%
connect 134 121 10.74%
create 348 348 0.00%
insert 1038 885 17.29%
select 399 257 55.25%
transactions
wisconsin 10 8 25.00%

There are some significant differences there (like the 55% increase on SELECT speeds, for example), but a single execution is never a good test. Also, it’s unclear whether the differences are between the compilations, the platform or just pure coincidence. This requires further investigation. As a coincidence, Krish Shankar posted these notes on using SunStudio 11 and SunStudio 12 and the right compiler flags to get the best optimization. I decided to do 10-pass iterations of sql-bench and compare both 32-bit and 64-bit standard builds, the 32-bit standard builds against Krish’s optimizations, and finally 32-bit and 64-bit optimized builds.

Some notes on all the tests:

  • All builds are 5.0.56
  • All tests are run on SunOS 5.11, snv_81
  • Tests are executed on the same OS and machine running in 64-bit. The SPARC tests are on an UltraSPARC IIIi@1.28GHz Workstation with 1GB RAM; x86 are on a Dell T105, Opteron 1212 with 4GB RAM. Of course we’re not comparing machine speed, just 32-bit binaries over 64-bit.
  • All results are in seconds; lower values mean faster performance.
  • In all tests I’m using the built-in defaults (i.e. no my.cnf anywhere) so as to simulate a standardized installation.

Let’s first look at x86 and the 32-bit standard and 32-bit optimized builds:

Test (x86, 32-bit) 32-bit (standard) 32-bit (optimized) +/-
ATIS 15.4 21 -26.67%
alter-table 15 16.3 -7.98%
big-tables 13.7 12.5 9.60%
connect 77.6 133 -41.65%
create 343.7 350.6 -1.97%
insert 760.3 1043.8 -27.16%
select 394.8 384.2 2.76%
transactions 10.8 18.6 -41.94%
wisconsin 6.6 10.1 -34.65%

The standard build uses gcc instead of SunStudio, but I don’t get the same performance increases that Krish saw – in fact, I see reductions in performance, not improvements at all. I’m going to rebuild and retest, because I’m convinced there’s a problem here with the builds that I’m not otherwise seeing. I certainly don’t expect to get results that show a 27% reduction in insert speed. That said, a 10% big-table increase is interesting. I’ll redo these builds and find out if the slow down is as marked as it here.Here’s the comparison for standard builds between 32-bit and 64-bit standard builds on x86:

Test (x86, standard) 32-bit 64-bit +/-
ATIS 15.4 13.5 14.07%
alter-table 15 10.6 41.51%
big-tables 13.7 10.6 29.25%
connect 77.6 76.4 1.57%
create 343.7 346 -0.66%
insert 760.3 681.6 11.55%
select 394.8 254.8 54.95%
transactions 10.8 10.7 0.00%
wisconsin 6.6 5.8 13.79%

There are some incredible differences here – more than 50% increase in SELECT, and 30% for the big-tables test show that there is some advantage to having the 64-bit builds on x86 enabled.Unfortunately I’ve had problems with the 64-bit optimized builds on my machine, so I haven’t completed optimized test comparisons.On SPARC, Sun Studio is used as the default compiler, and the standard 32-bit and 64-bit show little difference:

Test (SPARC, standard) 32-bit 64-bit +/-
ATIS 28.6 27.5 4.00%
alter-table 27 26.7 1.12%
big-tables 26.9 29.4 -8.50%
connect 166.3 173.6 -4.21%
create 155 143.1 8.32%
insert 1577.3 1572.3 0.32%
select 807.4 761.6 6.01%
transactions 19.5 18.75 4.00%
wisconsin 11.1 11.4 -2.63%

Overall, a pretty insignificant difference here. Now let’s compare the standard and optimized builds using Krish’s flags on SPARC:

Test (SPARC) 32-bit (standard) 32-bit (optimized) +/-
ATIS 28.6 27.75 3.06%
alter-table 27 26.25 2.86%
big-tables 26.9 25 7.60%
connect 166.3 162.5 2.34%
create 155 145.25 6.71%
insert 1577.3 1551.5 1.66%
select 807.4 769.625 4.91%
transactions 19.5 16.875 15.561%
wisconsin 11.1 10.875 2.07%

The tests here show little significant difference between the standard and the optimized builds, although 6-7% would probably be enough to prefer an optimized build if you wanted to build your own. Now let’s compare the optimized, Sun Studio 12 builds running in 32-bit and 64-bit:

Test (SPARC, optimized) 32-bit 64-bit +/-
ATIS 27.75 27.3 1.65%
alter-table 26.25 26.6 -1.32%
big-tables 25 25 0.00%
connect 162.5 162 0.31%
create 145.25 154.3 -5.87%
insert 1551.5 1535.1 1.07%
select 769.625 771.2 -0.20%
transactions 16.875 19.1 -11.65%
wisconsin 10.875 10.7 1.64%

The differences are virtually non-existent, and taking the reductions and increases in performance overall, there’s probably little
difference.The overall impression is that on x86 the improvement of 64-bit over 32-bit is significant enough that it’s probably a good idea to make 64-bit the default. On SPARC, the difference in the optimized builds is so slight that for compatibility reasons alone, 32-bit would probably make a better default.I’ll probably be re-running these tests over the next week or so (particularly the x86 so I can get a true comparison of the 64-bit optimized improvements), and I’ll try the T1000 which I upgraded to snv_81 over the weekend, but I think indications are good enough to make a reasonable recommendation of 64-bit over 32-bit.

Getting Best out of MySQL on Solaris

I’m still working up some good tips and advice on MySQL on Solaris (particularly the T1000, the new x86 based servers like the X4150 and ZFS, amongst other things), but until then I found Getting Best out of MySQL on Solaris while doing some research.With the latest OpenSolaris builds (b79, from memory) we now have MySQL built-in, and I worked with the folks on the OpenSolaris Database team to get some reasonable configurations and defaults into the system. MySQL 5.1 and 64-bit support is currently going through the process and will be a in future build. I’ve also been working with the DTrace people to improve the DTrace support we have in MySQL (documentation will go live this week, I hope). MySQL 6.0.4 will have some basic DTrace probes built-in, but I’ve proposed a patch to extend and improve on that significantly. We’re in the process of updating the documentation and advice on Solaris (and OpenSolaris) installations and layout too, which is itself part of a much larger overhaul of the installation and setup instructions for all platforms.

MySQL Documentation and Debian/Ubuntu

We’ve got a lot of queries recently on the MySQL docs team address about the documentation (particularly man pages) for MySQL on Debian/Ubuntu. The source of the original problem was reported as a Debian bug. The assumption from the reading of the license in this instance is that you are not allowed to distribute MySQL documentation unless you’ve asked first, and that the documentation is not released under the GPL license. The original license was misunderstood in this respect. In fact, the license as originally quoted in that bug does allow you to provide the documentation if you are providing the MySQL software. In addition, regardless of how you interpret the license, all of our documentation, including installable man pages, has been available on http://dev.mysql.com/doc. You can find online HTML, offline HTML, PDF, CHM and the man pages for all of our reference manuals (on a version by version basis), along with the main HTML/PDF formats for all of the remaining documentation. We have never tried to limit the availability of the documentation (that’s why we provide it in so many formats).However, as soon as this issue was reported on to us by the folks at Debian we agreed with our legal department to put the man pages under a GPL license. This affects only the man pages, but gets round the misunderstanding above by allowing the man pages to be distributed under the same GPL license as the software. Why did we only change our man page license?MySQL documentation is updated and released very often, in fact as oftenas ten times per day. Allowing anyone to create static copies of anarbitrary documentation release would lead to many outdated copies onthe ‘Net. This is bad for users doing Google searches for MySQLdocumentation, and bad for us (we’ve seen complaints about “our” 5.0.0Manual being badly outdated when MySQL 5.1.20 was out). We appreciateanyone mirroring the MySQL Dev Zone which contains all MySQLdocumentation.So where does that leave the man pages in Debian/Ubuntu? I’m pleased to say that the new 5.0.51-1 package for MySQL that has gone into the latest Debian release (actually, in December). That means that MySQL and the corresponding man pages should appear already in the latest Debian “unstable” branch, and the next major Debian release should include everything you need. Thanks to Christian Hammers (Debian) and Norbert Tretkowski (Debian) for their help on getting this all sorted!

MySQL and DBD::mysql on Mac OS X

I’ve been investigating some recent issues with installations of MySQL on Mac OS X and the Perl DBD::mysql module for accessing MySQL from within Perl through DBI. The problem exists only with binary distributions of MySQL and is related to the installation location of the libraries for the MySQL client that DBD::mysql uses. By default these are installed into /usr/local/mysql/lib, but the dynamic libraries are configured to be located within /usr/local/mysql/lib/mysql. It’s possible for DBD::mysql to build and link correctly, but trying to use the library will fail because it can’t find the library in the latter directory, even though it linked to the library in the former location. To get round this, the easiest method is to create a link within the directory that points to the parent. For example:

$ cd /usr/local/mysql/lib$ ln -s . mysql

That should fix the problem whether you run the commands before the DBD::mysql build or after it.

Bad database design at Oracle

I was clearing out my email this morning and closing down some of the mailing lists and other elements that I don’t use, and removing others. I’ve been an Oracle OTN member for years, but no longer get as much out of it as I used to. I almost thought about cancelling, but instead decided to try and change the email address on the account so that it would go into my ‘mailing lists’ account rather than an even older account that I will one day shut down for good. But I can’t. Why not? Because Oracle’s software doesn’t let me. It says so quite clearly in the FAQ. Does the inability for Oracle to manage my data in an effective way worry you? It worries me. Changing an email address is surely something that many people must have to do as they change companies or ISPs, but apparently Oracle didn’t think about this when designing the system. The solution? Unsubscribe from the newsletters for your existing account and create a new one. I did the first half of that, but I really can’t be bothered to do the second half.

Major rewrite of C/ODBC completed

One of my first major tasks at MySQL has just been completed – a major rewrite of the Connector/ODBC (C/ODBC) documentation. There were three major focuses for the rewrite:

  1. Bring the documentation up to date. We had a mix of information on the latest release (currently 3.51, but 5.0 is currently in development), but many of the sections didn’t reflect that new version. There is also new information on how to install the driver on Mac OS X.
  2. Restructure the information. This is something I’m doing across the board on the Connectors docs, as I try to re-organize them all into a more coherent, and compatible, structure. For example, I’ve collated all of the tips about using C/ODBC with different applications into their own section, organized by application. I’ve also extended the information; for example we now have a step by step guide to importing data from MySQL into Microsoft Word and Excel through Microsoft Query.
  3. Setting up the document so that I can more easily add and extend the information in there with tips from the community, bug fixes, and of course new releases.

I’ll now be continuing the work with the other Connectors, like Connector/J and Connector/NET.

One worker in one country

A recent article at CNN talks about how MySQL operates. As one of the MySQL team, I can attest that works, but it requires a significant amount of coordination, and lots of online communication through email, IRC, Skype and other methods to keep everbody talking and all the projects working together. The flip side to that process is that we all get involved in different areas, and you tend to be much more aware of what is going on company wide. There is also better cooperation – because we can all get involved we can all provide our experience and expertise to a wide range of problems and projects. Also, because we come from such a wide range of backgrounds and environments, we have a much wider perspective.So not only does remote, and earth-wide staffing work, but it provides us with a level of cooperation that might be more difficult if we all worked in group offices in the same building.