Cricalix.Net

Going sane since 1978

Browsing Posts in Code

Photography, for me, is a hobby verging on a passion; with most of my pleasure coming from shooting at gigs.  I get to hear good music, and I get to hone my skills – not a bad life really. I’m also a system administrator, with the ability to write code in a variety of languages, and for over a year now, I’ve been designing and poking at a dream gallery system that would give the subjects of my photos a measure of control on the distribution of those photos.

This entry isn’t to announce that I’ve managed to do that; it’s the complete opposite.

I’ve found that Zenfolio has come on a long way since I looked at it a year ago.  The Zenfolio service offers custom theming, order fulfillment, password protection, digital licensing and more.  I could, in time, write my own code to do this, but I’m coming to the realisation that I just don’t have the energy, nor skill, to write my dream system.  I could probably get my programming up to the skill level required to do it properly, but the time I’d spend doing that could be spent listening to live music or processing photos from a gig so that I can attempt to earn a little pocket money from them.

I think I’ll take the live music and pocket money.  I get plenty of chances to excercise my coding streak at work.

I’m doing some final tuning work on the Puppet recipes for our Glassfish installation, and Java has reared one of its ugly heads again. In this case, it’s the whole management of the command line arguments for the JVM.

The majority of the arguments we need to configure take the form
-D$variable=$value
-XX:$variable=$value

The problem is, Java also has arguments that look like
-X$variable$value

This means my nice simple recipe to deal with tweaking the Glassfish JVM options doesn’t actually handle all the cases, so I now need to either go write a more complex one (and imbue it with knowledge as to what variables don’t have equal signs), or write a second recipe with a different name to handle these special snowflake options.

Feh.

Doing more testing at work today, and decided to pickup the latest compiled output from the build server.

Exception occured in J2EEC Phasejava.lang.IllegalArgumentException: Unknown ContainerTransaction type [Requires]
com.sun.enterprise.deployment.backend.IASDeploymentException: Error loading deployment descriptors for module [EJB FILE] — Unknown ContainerTransaction type [Requires]
at com.sun.enterprise.deployment.backend.Deployer.loadDescriptors(Deployer.java:390)

Pinged one of the developers about that, and apparently it means that the transaction-type in ejb-jar.xml is wrong.  Yay.  For a reason I cannot fathom, Google had no results for this error either.

$work uses OpenNMS to monitor our various devices (servers, switches, routers, printers etc), mostly via SNMP.  Today, while looking at the various events that had been recorded, I noticed that a relatively simple search was taking more than 2 minutes to process through ~250,000 event rows (plus associated rows in other tables).  I turned on query logging (log_statements = ‘all’ && service postgresql reload), and re-ran the search from the web interface.

Lo and behold, the culprit was revealed – the search went something like

UPPER(eventlogmsg) LIKE ‘%value%’

Even if that field was indexed,the use of the ‘%’ on the front would negate the use of the index.  I threw the query into PGAdmin, and discovered that the query plan that PostgreSQL chose was a pair of nested loop joins – unpleasant to say the least.  A quick gander at the docs, and a few SQL statements later, and I had a full-text index on the eventlogmsg field.  Several test queries convinced me that it was much faster, so I threw the new query into PGAdmin and asked for the pretty query plan.  Two hash joins and a sort, and a query time of 31 milliseconds; or more than 5000 times faster.

So, I’ve filed a Bugzilla entry for this with the OpenNMS team – unfortunately it’s specific to PostgreSQL 8.3, but that’s something that can probably be determined at run-time and install-time.  Hopefully they agree that it’s a worthwhile performance change, as full-text indexes won’t actually help the LIKE usage – the code will have to change to generate new SQL statements.

I’ve been working with Glassfish recently, from the system administration point of view.  First task, after getting a good build with Maven (doing it with basic rpm methods netted me a massive dependency list, including things like Firefox!), was to write an init script so that Glassfish can be integrated into the CentOS boot sequence.

Because we might have multiple domains set up inside of Glassfish, I opted for a setup similar to the Tomcat5 init script – check the basename of $0, and use that to determine which domain to boot up.  The fiddling in start() gets around the fact that Glassfish doesn’t seem to write a PID file out where we need one.

So, just in case anyone else needs to do this:

#!/bin/bash
# chkconfig: 2345 85 15
# description: GlassFish is a Java Application Server.
# processname: glassfish
# pidfile: /var/run/glassfish.pid
 
# source function library
. /etc/init.d/functions
 
RETVAL=0
GLASSFISH_BIN="/var/lib/glassfish/bin"
 
# Basename works with symbolic links.
NAME="$(basename $0)"
unset ISBOOT
# Trim off the Sxx/Kxx prefix
if [ "${NAME:0:1}" = "S" -o "${NAME:0:1}" = "K" ]; then
    NAME="${NAME:3}"
    ISBOOT="1"
fi
# Trim off the glassfish- prefix
NAME=${NAME:10}
 
# /etc/init.d/glassfish should never be called directly.
if [ -z $NAME ]; then
        echo -n $"Cannot start Glassfish without specifying a domain."
        failure
        echo
        exit 1
fi
 
start() {
        echo -n $"Starting Glassfish V2 domain $NAME: "
        daemon --user glassfish --pidfile /var/run/glassfish-$NAME.pid "$GLASSFISH_BIN/asadmin start-domain $NAME >/dev/null 2>&1"
        RETVAL=$?
        if [ $RETVAL -eq 0 ]; then
                PID=`ps U glassfish | grep $NAME | awk '{ print $1}'`
                echo $PID > /var/run/glassfish-$NAME.pid
                touch /var/lock/subsys/glassfish-$NAME
        fi
        echo
}
stop() {
        echo -n $"Shutting down Glassfish V2 domain $NAME: "
        $GLASSFISH_BIN/asadmin stop-domain $NAME >/dev/null 2>&1
        RETVAL=$?
        [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/glassfish-$NAME && rm -f /var/run/glassfish-$NAME  && success || failure
        echo
}
 
case "$1" in
  start)
        start
        ;;
  stop)
        stop
        ;;
  restart|reload)
        stop
        start
        ;;
  condrestart)
        if [ -f /var/lock/subsys/glassfish-$NAME ]; then
            stop
            start
        fi
        ;;
  status)
        status glassfish-$NAME
        RETVAL=$?
        ;;
  *)
        echo $"Usage: $0 {start|stop|restart|condrestart|status}"
        exit 1
esac
 
exit $RETVAL

The alternative is to define a /etc/sysconfig/glassfish file, and insert a variable with the list of domains to boot, in sequence.  This is a little harder to manage automatically in Puppet, but might be a better solution if precise boot sequences are required (this method will boot in sequence based on the S numbers in the base script, and then the alphabetical ordering of the names).

Over the past few months, the Avatar code has been having a few crashes that leave no recognisable/usable stack for GDB to read.  It’s also been having a few hangs, with strace indicating futex_wait (in an application that doesn’t use threads), and gdb of the core (after killing the process) indicating __kernel_vsyscall.  Unfortunately, I’m not really a programmer/coder, so my efforts to track the cause down have probably been a bit haphazard.

The most annoying part so far is that yesterday we encountered the hang situation 4 times, so I enabled a strace against the binary, and channeled the output across the ‘net to my PC where I’ve got a rolling 40,000 line buffer.  24 hours later, at a constant 2 Mbit/s, and we still haven’t hung.

I call Heisenbug.

One of my long-term tickets at work is to provide LDAP or Kerberos integration for our servers at a minimum, and all Linux workstations and laptops if possible.  I poked it a bit today, and made a disappointing discovery.  Unlike Windows, a CentOS machine running LDAP as the primary authentication method is unable to cache the password hash.  So, if I enable it on a laptop, then disconnect that laptop from the network, I am unable to log in as any user that has previously logged in with LDAP credentials.

Bummer.

I doubt Kerberos is going to solve this either, as the caching is performed by nscd, and it’s nscd that doesn’t cache the hashes.  I suppose I could work on a custom PAM module that hooks in to pam_ldap, and on successful authentication, stores a new MD5 password for the account in /etc/passwd.  This way, a machine going off-line would have accurate local password hashes, and authentication would still work.

That sounds like way too much work though, and outside the scope of my job.

Powered by WordPress Web Design by SRS Solutions © 2012 Cricalix.Net Design by SRS Solutions