GCP VM SSH problem and solution
source link: https://allsyed.com/posts/gcp-vm-ssh-problem-and-solution/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
The Start
It was just another day at work. I was working on just another task in my everyday routine. I was required to login to a VM, let’s just call it $INSTANCE throughout this post and update few configs. I logged into google cloud console. Selected the project from project selector. Navigated to compute engine and clicked on SSH, doing so would open a pop up window and drops you into the familiar bash shell, not today. Instead, it kept on loading.
The Denial
I was confused, this has never happened before. I double-checked my internet, tried a different browser, used alternate internet connection, all actions ended up with same result. The loading pop up window
Attempt #1 : gcloud command
gcloud beta compute ssh $INSTANCE --zone $ZONE --project $PROJECT
Attempt #2 : gcloud command with username
gcloud compute ssh $USR@$INSTANCE --zone $ZONE --project $PROJECT
Attempt #3 : gcloud command with verbose flag
gcloud compute ssh --zone $ZONE $INSTANCE --project $PROJECT --ssh-flag="-vvvvv"
Attempt #4 : gcloud command with compute engine and my newly generated ssh keypair
gcloud compute ssh --zone $ZONE $INSTANCE --project $PROJECT --ssh-key-file=$HOME/.ssh/google_compute_engine --ssh-flag="-vvv" # compute engine default
gcloud compute ssh --zone $ZONE $INSTANCE --project $PROJECT --ssh-key-file=$HOME/.ssh/new-ssh-key --ssh-flag="-vvv"
Attempt #5 : Reconfiguring gcloud ssh
rm $HOME/.ssh/google_compute_engine $HOME/.ssh/google_compute_engine.pub # removing default key pair
gcloud compute config-ssh
After this step, I went thought all above step once again.All yielding same result
Attempt #6 : ssh command with default and new keys
ssh -i $HOME/.ssh/new_key $USR@$INSTANCE_IP
ssh -i $HOME/.ssh/google_compute_engine $USR@$INSTANCE_IP
That same result was.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255]
The Hint
I discussed this problem with my project manager, He asked to get help from one of our cloud team member.During out conversation he suggested that enabling serial port with help the debugging of the problem and also there is something called startup-script, which does what it says, runs a script on VM start up. With these new-found hints I started to dig deeper.
Analysing serial port log
gcloud compute connect-to-serial-port $INSTANCE --zone=$ZONE --project=$PROJECT
This step right there revealed that VM ran out of storage.
Solution #1 : startup-script
I added metadata startup-script with content below. I have also tried below script with sudo, making sure I don’t leave any stone un-turned. After 3-4 trial-errors and extensive analysing of logs. I could conclude that startup-script was also not triggering.
#!/usr/bin/env bash
find /home/user/ -name "*.log" -delete
Solution #2 : shutdown-script
shutdown-script is again a script that is executed before machine is switched off, its content was same as startup-script. These were not triggering since there was not enough storage on VM.
Solution #3 : Resizing disk
If it ran out of storage, simply add more storage to VM boot disk will fix this problem. So, I decide to resize the boot disk after switching off the VM. I must say the resize command completed almost instantly.
gcloud compute disks resize $INSTANCE --zone $ZONE --size <int> --project $PROJECT
I started the VM, thinking Issue is resolved, But I was wrong. It greeted me with same error message when tried connecting it.
Solution #4 : Final Solution
While I was skimming though the documentation I read that you could detach and re-attach boot disks. I got an idea. I remembered that there is one snapshot of this VM which was taken when things were green. Here are my steps to solution.
- Switch off the VM
- Creating a disk from snapshot
- detaching current boot disk
- re-attaching disk create in first step
- Switch it back on and hope it will work
gcloud compute disks create $NEW_DISK --source-snapshot $SNAPSHOT --project=$PROJECT --size <int> --zone $ZONE
gcloud beta compute instances detach-disk $INSTANCE --disk $OLD_DISK --project=$PROJECT
gcloud beta compute instances attach-disk $INSTANCE --disk $NEW_DISK --boot --project=$PROJECT
Conclusion
Voilà! I was able to access the machine. Someone would ask why go through all the hassle. You could have just create a new VM using snapshot. I couldn’t do that, I didn’t want to lose the VM metadata and more importantly VM IP. Since, this server was used by many of our customers, and they connect to it via IP.
What I have learned
- There is a serial port on compute instance that GCP providers.
- startup-script and shutdown-script
- You can detach and re-attach boot disk, again this might not work exactly for a windows VM
clean up
I have cleaning up to do. Deleting the old boot disk, removing extra ssh keys from metadata, updating my code such that it removes old log files. These log files were the very reason for existence of this problem
References
Recommend
-
61
除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。
-
80
除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。
-
80
除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。
-
57
On GCP, your database your way 2018-07-26...
-
68
宽带症候群 - @kljsandjb - ![V2er]( https://i.loli.net/2018/10/23/5bce55996c0d5.png)<br><br>上一次看到这样速度还是 lightsail 日本?
-
43
In this post, we will take a look at how we can use Google Cloud Platform (GCP) SQL as a database for our Spring Boot application. We will investigate how we can use the Cloud database from our development machine and how...
-
30
A project playing with open data from SETI They say that the best way to learn data science is to create something. Once you’ve covering the basics of data manipulation, coding and statistics, using text...
-
12
Setting Up a NGINX + Flask Server on GCP Jul 4, 2020 I have always been wanting to set up a dashboard on the cloud which I could use to monitor IoT products and devices on the field. I was aware that AWS, Azure and GCP...
-
4
An introduction to DiskPart, and the solution to a problem. Last night I wanted to transfer some files between laptops, and tried using an old usb-stick I had laying around at home. At first, it wasn’t recognized by Wind...
-
9
Veeam Backup for GCP and Multi-tenancy Twitter 0 Facebook 0 LinkedIn 0 Email --
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK