Site Tools


sysadmin:projects:s23:linuxrebuild

This is an old revision of the document!


Linux and NoMachine Rebuild

Problem

The current Linux/NoMachine setup has two main issues. An increase in VSCode sessions has increased the total resource consumption on the linux nodes leading to partial/whole system outages.

Any solution other than rebuilding the current system as is will involved parallelizing the setup to reduce the impact any one failed node will have on other students using the cluster.

Most potential solutions suggest breaking apart the NoMachine and SSH services as it is easier (and cheaper) to parallelize the SSH systems where as NoMachine is licensed per server and CPU core.

Solutions

Current setup plus user limits

Per user memory and cpu limits can be enforced using cgroups/systemd.

Benefits
  • Easier to setup
  • Fewer nodes to manage
  • Should keep one user from taking down entire cluster
Drawbacks
  • Does not limit impact by any one node becoming unavailable
  • Limits ability of extra resources bursty workloads
  • NoMachine head node remains particularly vulnerable to outages (would take down all NX sessions)
Notes
  • A more thorough use of Ansible would be recommended to effectively manage updates

New SSH VMs

Build new KVM based VMs in Proxmox

Benefits
  • Setup is closer to current system and would involve fewer unknowns
  • SSH/VSCode connections would no longer impact NoMachine
Drawbacks
  • Limited ability to parallelize ssh before management of nodes becomes more difficult

Kubernetes Based Shared Containers

Benefits
  • Potential to auto-scale cluster to more responsively meet the load
  • Docker container based setup could also be distributed to students
Drawbacks
  • Potentially more complex setup with more unknowns

Container SSH

Use ContainerSSH to allow one kubernetes container per student

Benefits
  • Completely removes impact of one student an another user's environment
Drawbacks
  • Under relatively inactive development - new and potentially unstable
  • Complex setup for authentication server
sysadmin/projects/s23/linuxrebuild.1681326011.txt.gz · Last modified: 2023/04/12 19:00 by kjohns23