Beta server deployment and read failure by disappearance of authorized public keys (Sprint 87: May 27, 2024)
What Happened:
On May 27, 2024, at 10:11 am PT, Sayaka noticed that deployments to beta had been failing.
The team also noticed that their postgres read permissions for beta data had stopped working
authorized_keys files in all .ssh folders were mysteriously empty
How We Resolved:
We created a new key pair on the server for Github actions to use and set the new secrets in this PR https://github.com/LiteFarmOrg/LiteFarm/pull/3214
We added back public keys to the readonly user
Lessons Learned:
Root Cause Analysis: – UNKNOWN --
Impact Assessment: Luckily this was on beta but potentially locking out CI/CD integrations or researchers accessing data.
Additional Insights:
Possible unconfirmed causes:
Duncan exited out of the file wrong when completing work to restore the server during the last AAR issue either a vim exit issue or exiting out during edit mode. But doesn't explain why it happened to all .ssh folders.
Restarting the droplet had an affect on the files
Other
Action Items:
Consider making a snapshot of the droplet:
Consider copying the authorized_keys file to authorized_keys_copy at regular intervals:
- Preserved here: