I have begun the process of building a lab for my team of HPC consultants, and I’m trying to make some plans. I would like this to be as flexible as I can make it. I live 3½ hours away from the site, so the fewer trips down there to recable and/or move stuff around the better! Most of this hardware has various older InfiniBand connectivity, along with multiport LOM & OCP cards at either 1Gb or 10Gb. Most also have the option to do dedicated and shared BMC. We have 2 dedicated IPs (so far) that I’m currently using for the head node’s BMC & SSH access. This will be all Linux, though we will be accessing web interfaces when testing various products. My initial thoughts:

  • Identify what we want to keep and what we want to excess. There’s some _very_old hardware in there! There’s also some old OmniPath hardware in there. We don’t see much OPA, but some team members seem to think that may change. Still this stuff is old.
  • Carve out a management/provisioning network. Ideally, this will allow us to switch between dedicated and shared BMC ports at will. We use this for customer knowledge transfer when we demo our cluster management software. The share ports are usually the onboard port 1, which is usually 1gb, so this is easy enough. We can probably cable all of that up to 1 switch.
  • Identify a subset of nodes to cables up the capability of accessing the campus network. These systems are behind the company VPN, and we will be controlling login access ourselves. While I’m not worried about someone on the team doing something nefarious on the company network, I don’t want everything to have this capability. Still, having the option with some will give us some flexibility, and we have a handful of systems with more Ethernet ports than we would otherwise need (campus LAN access is 1Gb).
  • Head node will run Proxmox to give us the flexibility to spin up temporary test heads for team member projects. The idea here is we can partition the network using VLANs to isolate what a group is doing with some systems from what anybody else is doing. The current head node has sufficient space to host shared home directories. We will also have a small IBM ESS that will be added to these racks next time I’m there.
  • I had thought about running some containers in either a VM on the head node or some LXCs. Right now the only thing I’m thinking about on that front is netbox.

This is what I have off the top of my head. If there’s any useful software, procedures, or if I’m on the wrong path entirely, I’d appreciate your help. We have a modest budget, but we did convince our management to at least buy us a used 1Gb switch that is at least similar hardware that we would see “in the wild.” We’re hoping we can use the lab to show value there and get them to approve some other, still modest, requests in the future!

  • ClownStatue@piefed.socialOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    We actually use Redmine on another server that doesn’t require the VPN (still requires login though). Figured that would probably be a decent place for that stuff. Won’t be posting any passwords there! Initial access to the cluster will be key-based.

    • moonpiedumplings@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      I (plus friends who do something similar) have been using centralized auth systems for this stuff. Proxmox supports OIDC, so if you are using Authentik or something similar you can just use one password.

      And then Authentik supports 2FA, so you can use TOTP with that, or use passwords only.