This is an old revision of the document!
Table of Contents
Linux and NoMachine Rebuild
Problem
The current Linux/NoMachine setup has two main issues. An increase in VSCode sessions has increased the total resource consumption on the linux nodes.
Any solution other than rebuilding the current system as is will involved parallelizing the setup to reduce the impact any one failed node will have on other students using the cluster.
Most potential solutions suggest breaking apart the NoMachine and SSH services as it is easier (and cheaper) to parallelize the SSH systems where as NoMachine is licenses per server and CPU core.
Solutions
Current setup plus user limits
Per user memory and cpu limits can be enforced using cgroups/systemd.
Benefits
- Easier to setup
- Fewer nodes to manage
- Should keep one user from taking down entire cluster
Drawbacks
- Does not limit impact by any one node becoming unavailable
- Limits ability of extra resources bursty workloads
New SSH VMs
Build new KVM based VMs in Proxmox
Container SSH
Use https://containerssh.io/ to allow one kubernetes container per student
Benefits
- Completely removes impact of one student an another user's environment
Drawbacks
- Under relatively inactive development - new and potentially unstable
- Complex setup for authentication server