Table of Contents

Linux and NoMachine Rebuild

Problem

The current Linux/NoMachine setup has two main issues. An increase in VSCode sessions has increased the total resource consumption on the linux nodes leading to partial/whole system outages.

Any solution other than rebuilding the current system as is will involved parallelizing the setup to reduce the impact any one failed node will have on other students using the cluster.

Most potential solutions suggest breaking apart the NoMachine and SSH services as it is easier (and cheaper) to parallelize the SSH systems where as NoMachine is licensed per server and CPU core.

Solutions

Current setup plus user limits

Per user memory and cpu limits can be enforced using cgroups/systemd.

Benefits
Drawbacks
Notes

New SSH VMs

Build new KVM based VMs in Proxmox

Benefits
Drawbacks

Kubernetes Based Shared Containers

Build a SoCS Linux Docker Container and deploy to Kubernetes. These containers would be shared, similar to the current environment, but Kubernetes offers the opportunity to run many more nodes in parallel than a VM based setup could be effectively managed.

Benefits
Drawbacks
Notes

Container SSH

Use ContainerSSH to allow one kubernetes container per student. Students would SSH to the cluster as they currently do, however they would instead be routed to their own dynamically provisioned container.

Benefits
Drawbacks