Encountered a rather annoying event yesterday. One of the guys in my department mentioned that he couldn’t upload data into our purchase order area on Alfresco. We’ve had a few interesting glitches with Alfresco before (primarily oriented around duplicate filenames in automation routines), but nothing that serious.
A review of the logs showed something about an invalid transaction state of MARKED_ROLLBACK, so off to the Great Google I went, hoping the Oracle could provide me with an answer or three. The only reference I found initially was something about 32K directories, and strangely enough the Lucene index area had 32,000 directories in a single directory. Now, 32,000 is not 32K to a computer, but it was close enough for me to blow away the indexes and restart Alfresco so that it would re-index the data.
Alfresco wouldn’t restart, claiming index.recover.mode wasn’t FULL, and that the index for 1 store was missing. Quite correct, it was missing. So, edit repository.properties (mistake number 1) and try again. Same error. Scratch head, disable the strict checks, reload Tomcat again (due to wonderful jmxrmi errors if Alfresco has tried to start and failed). Boots all the way, but the index doesn’t rebuild (mistake number 2).
Log in. ‘Umm.’ There was no data. Panic slightly, go digging in the file-system based store. Yep, files. Yep, that’s a Word document. Go digging in the PostgreSQL back-end. Yep, records. So, I’ve got data. I’ve got SQL records with meta-data about the data. I haven’t got a working Alfresco installation. Somehow, I found a posting on the Alfresco forums where someone had corrupted his indexes, and had to do the rebuild process. So I started following the steps he took (it’s custom-repository.properties!), and Alfresco duly report that it was indexing the data.
30 minutes later, Alfresco indicates that it’s finished the rebuild of the indexes, but Tomcat isn’t responding. Our trouble-ticketing system is out of action too while all of this is going on, which isn’t a good thing. Tell Tomcat to shut down instead. Mistake 3.
Boot Tomcat. Alfresco now complains that the webscripts are missing. This is a sign of a missing Data Dictionary, and it’s a Bad Thing(tm). There’s a way to make Alfresco boot without them though, and I do that. Still no data. The Lucene index directory is 72kb, that can’t be right. Run the rebuild again, this time watching top. Despite Alfresco saying ‘Yes, the index rebuild is done.’, java is chewing 99.9% of CPU. Left it alone for another 30 minutes, and suddenly there’s a log message about out of memory, but the indexes are back, and it’s now soffice.bin that’s chewing 99% of CPU.
Log in, holding breath. Data. It all works. Well, almost. 3 user accounts are scuppered a bit (mine, my boss’ and the admin account), but the system has re-created them with the appropriate permissions. That’s today’s challenge.
