Over the past year I've chipped away at setting up new servers for apestaart and managing the deployment in puppet as opposed to a by now years old manual single server configuration that would be hard to replicate if the drives fail (one of which did recently, making this more urgent).
It's been a while since I felt like I was good enough at puppet to love and hate it in equal parts, but mostly manage to control a deployment of around ten servers at a previous job.
Things were progressing an hour or two here and there at a time, and accelerated when a friend in our collective was launching a new business for which I wanted to make sure he had a decent redundancy setup.
I was saving the hardest part for last - setting up Nagios monitoring with Matthias Saou's puppet-nagios module, which needs External Resources and storeconfigs working.
Even on the previous server setup based on CentOS 6, that was a pain to set up - needing MySQL and ruby's ActiveRecord. But it sorta worked.
It seems that for newer puppet setups, you're now supposed to use something called PuppetDB, which is not in fact a database on its own as the name suggests, but requires another database. Of course, it chose to need a different one - Postgres. Oh, and PuppetDB itself is in Java - now you get the cost of two runtimes when you use puppet!
So, to add useful Nagios monitoring to my puppet deploys, which without it are quite happy to be simple puppet apply runs from a local git checkout on each server, I now need storedconfigs which needs puppetdb which pulls in Java and Postgres. And that's just so a system that handles distributed configuration can actually be told about the results of that distributed configuration and create a useful feedback cycle allowing it to do useful things to the observed result.
Since I test these deployments on local vagrant/VirtualBox machines, I had to double their RAM because of this - even just the puppetdb java server by default starts with 192MB reserved out of the box.
But enough complaining about these expensive changes - at least there was a working puppetdb module that managed to set things up well enough.
It was easy enough to get the first host monitored, and apart from some minor changes (like updating the default Nagios config template from 3.x to 4.x), I had a familiar Nagios view working showing results from the server running Nagios itself. Success!
But all runs from the other vm's did not trigger adding any exported resources, and I couldn't find anything wrong in the logs. In fact, I could not find /var/log/puppetdb/puppetdb.log at all...
fun with utf-8
After a long night of experimenting and head scratching, I chased down a first clue in /var/log/messages saying puppet-master[17702]: Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB
I traced that down to puppetdb/char_encoding.rb, and with my limited ruby skills, I got a dump of the offending byte sequence by adding this code:
Puppet.warning "Ignoring invalid UTF-8 byte sequences in data to be sent to PuppetDB"
File.open('/tmp/ruby', 'w') { |file| file.write(str) }
Puppet.warning "THOMAS: is here"
(I tend to use my name in debugging to have something easy to grep for, and I wanted some verification that the File dump wasn't triggering any errors)
It took a little time at 3AM to remember where these /tmp files end up thanks to systemd, but once found, I saw it was a json blob with a command to "replace catalog". That could explain why my puppetdb didn't have any catalogs for other hosts. But file told me this was a plain ASCII file, so that didn't help me narrow it down.
I brute forced it by just checking my whole puppet tree:
find . -type f -exec file {} \; > /tmp/puppetfile
grep -v ASCII /tmp/puppetfile | grep -v git
This turned up a few UTF-8 candidates. Googling around, I was reminded about how terrible utf-8 handling was in ruby 1.8, and saw information that puppet recommended using ASCII only in most of the manifests and files to avoid issues.
It turned out to be a config from a webalizer module:
webalizer/templates/webalizer.conf.erb: UTF-8 Unicode text
While it was written by a Jesús with a unicode name, the file itself didn't have his name in it, and I couldn't obviously find where the UTF-8 chars were hiding. One StackOverflow post later, I had nailed it down - UTF-8 spaces!
00004ba0 2e 0a 23 c2 a0 4e 6f 74 65 20 66 6f 72 20 74 68 |..#..Note for th|
00004bb0 69 73 20 74 6f 20 77 6f 72 6b 20 79 6f 75 20 6e |is to work you n|
The offending character is c2 a0 - the non-breaking space
I have no idea how that slipped into a comment in a config file, but I changed the spaces and got rid of the error.
Puppet's error was vague, did not provide any context whatsoever (Where do the bytes come from? Dump the part that is parseable? Dump the hex representation? Tell me the position in it where the problem is?), did not give any indication of the potential impact, and in a sea of spurious puppet warnings that you simply have to live with, is easy to miss. One down.
However, still no catalogs on the server, so still only one host being monitored. What next?
users, groups, and permissions
Chasing my next lead turned out to be my own fault. After turning off SELinux temporarily, checking all permissions on all puppetdb files to make sure that they were group-owned by puppetdb and writable for puppet, I took the last step of switching to that user role and trying to write the log file myself. And it failed. Huh? And then id told me why - while /var/log/puppetdb/ was group-writeable and owned by puppetdb group, my puppetdb user was actually in the www-data group.
It turns out that I had tried to move some uids and gids around after the automatic assignment puppet does gave different results on two hosts (a problem I still don't have a satisfying answer for, as I don't want to hard-code uids/gids for system accounts in other people's modules), and clearly I did one of them wrong.
I think a server that for whatever reason cannot log should simply not start, as this is a critical error if you want a defensive system.
After fixing that properly, I now had a puppetdb log file.
resource titles
Now I was staring at an actual exception:
2016-10-09 14:39:33,957 ERROR [c.p.p.command] [85bae55f-671c-43cf-9a54-c149cede
c659] [replace catalog] Fatal error on attempt 0
java.lang.IllegalArgumentException: Resource '{:type "File", :title "/var/lib/p
uppet/concat/thomas_vimrc/fragments/75_thomas_vimrc-\" allow adding additional
config through .vimrc.local_if filereadable(glob(\"~_.vimrc.local\"))_\tsource
~_.vimrc.local_endif_"}' has an invalid tag 'thomas:vimrc-" allow adding additi
onal config through .vimrc.local
if filereadable(glob("~/.vimrc.local"))
source ~/.vimrc.local
endif
'. Tags must match the pattern /\A[a-z0-9_][a-z0-9_:\-.]*\Z/.
at com.puppetlabs.puppetdb.catalogs$validate_resources.invoke(catalogs.
clj:331) ~[na:na]
Given the name of the command (replace catalog), I felt certain this was going to be the problem standing between me and multiple hosts being monitored.
The problem was a few levels deep, but essentially I had code creating fragments of vimrc files using the concat module, and was naming the resources with file content as part of the title. That's not a great idea, admittedly, but no other part of puppet had ever complained about it before. Even the files on my file system that store the fragments, which get their filename from these titles, happily stored with a double quote in its name.
So yet again, puppet's lax approach to specifying types of variables at any of its layers (hiera, puppet code, ruby code, ruby templates, puppetdb) in any of its data formats (yaml, json, bytes for strings without encoding information) triggers errors somewhere in the stack without informing whatever triggered that error (ie, the agent run on the client didn't complain or fail).
Once again, puppet has given me plenty of reasons to hate it with a passion, tipping the balance.
I couldn't imagine doing server management without a tool like puppet. But you love it when you don't have to tweak it much, and you hate it when you're actually making extensive changes. Hopefully after today I can get back to the loving it part.
Over the past year I've chipped away at setting up new servers for apestaart and managing the deployment in puppet as opposed to a by now years old manual single...