I have begun the process of building a lab for my team of HPC consultants, and I’m trying to make some plans. I would like this to be as flexible as I can make it. I live 3½ hours away from the site, so the fewer trips down there to recable and/or move stuff around the better! Most of this hardware has various older InfiniBand connectivity, along with multiport LOM & OCP cards at either 1Gb or 10Gb. Most also have the option to do dedicated and shared BMC. We have 2 dedicated IPs (so far) that I’m currently using for the head node’s BMC & SSH access. This will be all Linux, though we will be accessing web interfaces when testing various products. My initial thoughts:

  • Identify what we want to keep and what we want to excess. There’s some _very_old hardware in there! There’s also some old OmniPath hardware in there. We don’t see much OPA, but some team members seem to think that may change. Still this stuff is old.
  • Carve out a management/provisioning network. Ideally, this will allow us to switch between dedicated and shared BMC ports at will. We use this for customer knowledge transfer when we demo our cluster management software. The share ports are usually the onboard port 1, which is usually 1gb, so this is easy enough. We can probably cable all of that up to 1 switch.
  • Identify a subset of nodes to cables up the capability of accessing the campus network. These systems are behind the company VPN, and we will be controlling login access ourselves. While I’m not worried about someone on the team doing something nefarious on the company network, I don’t want everything to have this capability. Still, having the option with some will give us some flexibility, and we have a handful of systems with more Ethernet ports than we would otherwise need (campus LAN access is 1Gb).
  • Head node will run Proxmox to give us the flexibility to spin up temporary test heads for team member projects. The idea here is we can partition the network using VLANs to isolate what a group is doing with some systems from what anybody else is doing. The current head node has sufficient space to host shared home directories. We will also have a small IBM ESS that will be added to these racks next time I’m there.
  • I had thought about running some containers in either a VM on the head node or some LXCs. Right now the only thing I’m thinking about on that front is netbox.

This is what I have off the top of my head. If there’s any useful software, procedures, or if I’m on the wrong path entirely, I’d appreciate your help. We have a modest budget, but we did convince our management to at least buy us a used 1Gb switch that is at least similar hardware that we would see “in the wild.” We’re hoping we can use the lab to show value there and get them to approve some other, still modest, requests in the future!

  • moonpiedumplings@programming.dev
    link
    fedilink
    arrow-up
    3
    ·
    1 day ago

    In addition to netbox, a wiki or other knowledgebase would be nice. You can document setup procedures as you go, and then other people can use that to figure stuff out.

    • ClownStatue@piefed.socialOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      We actually use Redmine on another server that doesn’t require the VPN (still requires login though). Figured that would probably be a decent place for that stuff. Won’t be posting any passwords there! Initial access to the cluster will be key-based.

      • moonpiedumplings@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        I (plus friends who do something similar) have been using centralized auth systems for this stuff. Proxmox supports OIDC, so if you are using Authentik or something similar you can just use one password.

        And then Authentik supports 2FA, so you can use TOTP with that, or use passwords only.

  • frongt@lemmy.zip
    link
    fedilink
    arrow-up
    1
    ·
    23 hours ago

    What are you trying to lab? I would virtualize as much as possible, but that only gets you so far. You can’t really virtualize appliances or hardware like infiniband connections.

    You might just set up a whole day once a month to go in and run a class on setting up one kind of environment, then give them a few weeks to play around, break it, tear it down, and rebuild it.

  • Lettuce eat lettuce@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    23 hours ago

    I’m currently in the process of deploying a Linux lab environment at my current workplace actually.

    Check out Incus if you haven’t already. Incus Containers

    It’s a community supported fully open source fork of LXC. It supports full system Linux containers as opposed to Docker-style single application containers. It also supports full QEMU virtual machines of you need them.

    It’s likely going to replace my entire traditional type-1 hypervisor setup in my home lab because of how much lighter weight it is. My most lightweight VMs are typically still 2GB of RAM, things tend to get funky when I go below that. Whereas my clean install of a Debian 13 container on Incus was using around 90MB. In these crazy times, anything that uses RAM 10x-20x more efficiently has my attention.

    It can also do all the typical hypervisor back end stuff, HA clustering, automatic container snapshots, userspace isolation, virtual networking, static and dynamic resource limitations, etc.

    The daemon runs on all major distros, but you can also build and use their IncusOS, which is an immutable distro-fied Incus deployment that is optimized out of the box. (Although I’ve had great results just running it as a daemon on a basic Debian installation.)

    It’s super easy to learn and get going, and it’s working perfectly in the early tests for my team as a Lab environment platform.

    • moonpiedumplings@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      21 hours ago

      Do you use the web ui?

      I use the web ui heavily, but it’s only packaged by the incus package from the author, and not included in the debian packages.

      Also, what are you using for authentication?

      • Lettuce eat lettuce@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        18 hours ago

        No web ui, just the direct CLI interface.

        For my team, I have set up lab accounts on the host machine and configured the SSHD daemon to drop them directly into their designated lab container when they use that account and key combo.

        Nothing fancy.