Mastodon

Multiple "waiting for Changelog Lock" = Liquibase locked your Database

Some time ago, an application of mine that is deployed in the Pivotal Cloudfoundry began throwing errors and crashed:

2019-09-29T06:42:54.76+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:54.761  INFO 10 --- [           main] o.a.c.c.C.[.[localhost].[/ithubbs]       : Initializing Spring embedded WebApplicationContext
2019-09-29T06:42:54.76+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:54.763  INFO 10 --- [           main] o.s.web.context.ContextLoader            : Root WebApplicationContext: initialization completed in 4397 ms
2019-09-29T06:42:55.00+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:55.001  INFO 10 --- [           main] o.c.metrics.CloudFoundryTagsMeterFilter  : Adding CloudFoundry Micrometer tags if not otherwise specified: cf.account=https://donotuseapi.run.pivotal.io, cf.application=ithubbs_backend, cf.cluster=ithubbs_backend, cf.instance.index=0, cf.organization=ithubbs, cf.space=prod
2019-09-29T06:42:55.87+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:55.877  INFO 10 --- [           main] o.s.c.s.r.PooledDataSourceCreator        : Found HikariCP on the classpath. Using it for DataSource connection pooling.
2019-09-29T06:42:56.63+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:56.631  INFO 10 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2019-09-29T06:42:56.89+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:56.898  INFO 10 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2019-09-29T06:42:58.51+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:58.518  INFO 10 --- [           main] liquibase.executor.jvm.JdbcExecutor      : SELECT COUNT(*) FROM public.databasechangeloglock
2019-09-29T06:42:58.53+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:58.538  INFO 10 --- [           main] liquibase.executor.jvm.JdbcExecutor      : SELECT COUNT(*) FROM public.databasechangeloglock
2019-09-29T06:42:58.55+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:58.544  INFO 10 --- [           main] liquibase.executor.jvm.JdbcExecutor      : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:42:58.55+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:42:58.558  INFO 10 --- [           main] l.lockservice.StandardLockService        : Waiting for changelog lock....
2019-09-29T06:43:08.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:08.559  INFO 10 --- [           main] liquibase.executor.jvm.JdbcExecutor      : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:43:08.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:08.561  INFO 10 --- [           main] l.lockservice.StandardLockService        : Waiting for changelog lock....
2019-09-29T06:43:18.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:18.563  INFO 10 --- [           main] liquibase.executor.jvm.JdbcExecutor      : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:43:18.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:18.568  INFO 10 --- [           main] l.lockservice.StandardLockService        : Waiting for changelog lock....
2019-09-29T06:43:28.56+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:28.569  INFO 10 --- [           main] liquibase.executor.jvm.JdbcExecutor      : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:43:28.57+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:28.571  INFO 10 --- [           main] l.lockservice.StandardLockService        : Waiting for changelog lock....
2019-09-29T06:43:38.57+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:38.572  INFO 10 --- [           main] liquibase.executor.jvm.JdbcExecutor      : SELECT LOCKED FROM public.databasechangeloglock WHERE ID=1
2019-09-29T06:43:38.57+0200 [APP/PROC/WEB/0] OUT 2019-09-29 04:43:38.575  INFO 10 --- [           main] l.lockservice.StandardLockService        : Waiting for changelog lock....
2019-09-29T06:43:45.77+0200 [HEALTH/0] ERR Failed to make TCP connection to port 8080: connection refused
2019-09-29T06:43:45.77+0200 [CELL/0] ERR Timed out after 1m0s: health check never passed.
2019-09-29T06:43:45.78+0200 [CELL/SSHD/0] OUT Exit status 0
2019-09-29T06:43:45.87+0200 [APP/PROC/WEB/0] OUT Exit status 143
2019-09-29T06:43:51.43+0200 [CELL/0] OUT Cell 76dba435-3477-41a6-bea1-9bd134040065 stopping instance f480f9d0-d3d7-4842-598c-8926
2019-09-29T06:43:51.44+0200 [API/4] OUT Process has crashed with type: "web"
2019-09-29T06:43:51.43+0200 [CELL/0] OUT Cell 76dba435-3477-41a6-bea1-9bd134040065 destroying container for instance f480f9d0-d3d7-4842-598c-8926
2019-09-29T06:43:51.47+0200 [API/4] OUT App instance exited with guid 740b4dfc-3200-4234-bc8c-d0efc7f38e67 payload: {"instance"=>"f480f9d0-d3d7-4842-598c-8926", "index"=>0, "cell_id"=>"76dba435-3477-41a6-bea1-9bd134040065", "reason"=>"CRASHED", "exit_description"=>"Instance never healthy after 1m0s: Failed to make TCP connection to port 8080: connection refused", "crash_count"=>1, "crash_timestamp"=>1569732231402656049, "version"=>"d2b95a45-ad59-433c-b780-08c2e5ba31d4"}

There have been no prior changes on the application or the setup.

Because the error occurred on a live application, I tried restarting it without looking too much into the cause of the problem. It was important to have this application live again as soon as possible. However, restarting failed with the same error.

In step two, I suspected Spring to not have enough resources, so I raised both “memory allocated” and “disk allowed”, without effect.

At that time, I began looking into the log in more detail. The failed connection / “connection refused” pointed to the database because it is the only service for that application. I thought that the health-check prevented startup of the application because the connection to the database could not be established fast enough. Raising the health-check timeout to 10 minutes also did not work.

At that point, it is worth mentioning that my application uses Liquibase. Googling for the multiple “Waiting for changelog lock” brought me to this article which suggested that Liquibase locked the database. Indeed, a

SELECT * FROM DATABASECHANGELOGLOCK;

returned

1 true2019-09-28 01:42:13 +0000 86b4b40a-826e-47f0-66f6-38f1 (10.243.232.98)

Unlocking it with

UPDATE DATABASECHANGELOGLOCK
SET locked=0, lockgranted=null, lockedby=null
WHERE id=1

brought the application back up.

Researching for the reason did not bring a specific and reproducible result. I found hints to the fact that when the application is stopped, Liquibase sometimes keeps the lock. However, that would not explain why the app was restarted in the first place.

I decided not to go into further research because this incident was the first problem with Liquibase and nothing really bad happened. Sadly, there are no build-in notifications for crashes like this in Cloudfoundry.

TL;DR

If you see “Waiting for changelog lock….” multiple times during startup of your Liquibase-using application, check if your database is unlocked.