I work on an HPC and often I have to share files with other users. The most approachable solution is to have an external cloud storage and recline back and forth. However there’s some projects that are quite heavy (several TB) and that is unfeasible. We do not have a shared group. The following is the only solution I found which is not to just set al permissions to 777, and I still don’t like it.
Create a directory and set ACL to give access to the selected users. This works fine if the users create new files in there, but it does not work if they copy from somewhere else as default umask is 022. Thus the only appropriate solution is to change default umask to 002, which however affects file creation system wide. The alternative is to change permissions every time you copy something, but you all know very well that is not going to happen.
Does it really have to be such a pain in the ass?
You can set acls on directories that get applied recursively. This makes ist possible to have all files be the correct permission. I am on the go right now but you should look into setfacl. It’s been a while but I am pretty sure that worked. That way you should even be able to say which groups or users can do what with granularity.
Uh, why not create the shared group? That’s more or less exactly the purpose of their existence.
Lots of paperwork involved around our HPC. Politica in the middle and plausibly would have to sign confidentiality agreement with everyone who shares anything in there, which would have to go through a review process which is generally takes about 6 months.
There’s a reason for this governance and you’re putting your whole team at risk trying to do this yourself
It’s just default Linux permissions. People who get access to the HPC through the same institution are placed in the same group. As you may know scientific collaboration is quite important. Indeed when collaborating we sign all the necessary paperwork, but that does not translate to the HPC administrators who are part of a separate institution. To request a separate group you have to contact the HPC institution, they will have to contact the institutions involved. Those institutions will have to check that NDAs are already in place. If the NDAs are in place they will have to check that the data to be shared is actually covered by the project. I will have to fill a bunch of paperwork. This will be sent to an external auditor to check that everything is correct, and then everything goes back in that chain.
I already waste way too much of my time on paperwork. Worst case scenario is a collaborator leaks some data which will be published publicly in a few months anyway. And those collaborators will have access to such data anyway, just through other less comfortable means.
The fact that you’re sharing this internal policy stuff so openly is definitely a red flag.
I don’t know what your background is, this is mostly hindrances when doing research. Administration has taken over and demands deciding how research should be conducted without having any idea on how it should be conducted.
You may see it as a red flag, myself I have this very clear that I do not want to follow their bullshit. If I’m losing my job so be It, However you may be misunderstanding: I am not going to lose my job over this, nobody is getting hurt, and I am sharing a bureaucratic process that is fairly common over here in public institutions. This is not some large corporation that has to keep secret the time schedule of it’s workers: if you wish to come over I have full liberty of deciding to show you anything I have on my computer and most projects I am working on. Yes, there’s a couple things I can not show you, but everything else is my own job and up to me to show to whoever I wish. Institutions may retain part of the IP if we decide to commercialise stuff, but I am the author and I am free to share anything I don’t have an NDA on.
I am afraid you come from a very different background and you are misunderstanding my situation.
Your job as sysadmin is to adhere to your organization’s policy, no matter how stupid and hindering that policy might seem to you.
You’re knowingly giving your users a workaround to their NDA, which puts all of your jobs and your data confidentiality at risk.
You’ve got no business with root privileges.
I have no root privileges, I’m providing no NDA workaround.
I’m no sysadmin, I just run my homelab. Let me get this straight… You want to bypass system level access level restrictions with some form of control but not go through your company’s standard method of doing so because of bureaucracy?
If that’s the case: why not put something in front Like opencloud for example?
I mean, maybe OC is not what you need, but conceptually… would a middleman solution work for you? If so, you could go with a thousand different alternatives depending on your needs.
A cloud solution is indeed an option, however not a very palatable one. The main problem with a cloud solution would be pricing. From what I can see, you can get 1TB for about 10€/month. We’d need substantially more than that. The cost is feasible and not excessive, but frankly it’s a bit of a joke to have to use someone else’s server when we have our own.
You want to bypass system level access level restrictions with some form of control but not go through your company’s standard method of doing so because of bureaucracy?
Yes. Not a company but public research, which means asking for a group change may lead to several people in the capital discussing on whether that is appropriate or not. I’d like this to be a joke, but it is not. We’d surely get access eventually if we do that, but that would lead to the unfortunate side: if we work in that way every new person who has to get in has to wait all that paperwork.
Don’t bypass your organizational policies
I am not bypassing any policy: the HPC Is there to collaborate on and data can be shared. Not having a shared group is not a policy, it’s just that not all users are in the same group and users are added to just one group by default. We are indeed allowed to share files, hell most of the people I want to share stuff with are part of my own research group. ACL is allowed on the HPC. I’m asking how to properly use ACL.
If you have anything actually useful go ahead, otherwise don’t worry that I know better than you do what I should or should not do.
You are in way over your head
Stop now before you get yourself in hot water
Fuck off.
I think he meant self-hosting Opencloud
Yes. That’s what I recommended. Self-host whatever middleman software. Opencloud, WebDAV, S3, FTP, anything he puts in the middle can accomplish what he wants.
I see! Well, I currently do not have another server that has so much storage that we could use for thi purpose. Maybe in the future and that will solve a bunch of problems, this is only one of them.
We do have a storage server, but that is local only and backup only: not going to open it to the internet.
It is indeed a solution. What is absurd to me is to have to consider such a solution that requires two servers.
You don’t need additional storage. It’s one program you need to set up.
It is not something I can setup on that server, I would need a separate server to set up something of that kind.
If it’s a compliance problem, I get it. From a practical standpoint, FTP or WebDAV don’t require installing anything.
I recommended Self-hosting whatever middleman software. Opencloud, WebDAV, S3, FTP, anything you put in the middle can accomplish what you want.
I’m in a similar position as you. Our lab has a partition on HPC but i need a way to quasi-administrate other lab members without truly having root access. What I found works is to have a shared bashrc script (which also contains useful common aliases and env variables) and get all your users to source it (in their own bashrc files). Set the umask within the shared bashrc file. Set certain folders to read only (for common references, e.g. genomes) if you don’t want people messing with shares resources. However, I’ve found that it’s only worth trying to admin shared resources and large datasets, otherwise let everyone junk their home folder with their own analyses. If the home folder is size limited, create a user’s folder in the scratch partition and let people store their junk there however they want. Just routinely check that nobody is abusing your storage quota.
EDIT: absolutely under no circumstances give people write access to raw shared data on hpc. I guarantee some idiot will edit it and mess it up for everyone. If people need to rename files they can learn how to symlink them.
This is a pretty good idea!
In addition, I recommend having all data e.g. as a (private)datalad archive synchronized to Dataverse, osf, figshare or wherever - edits are versioned then
I am generally using DVC to version data, are those better options?
I don’t know, seems to be quite similar :)
Thanks, this is a great idea! I can see you have been doing this for a long time and you’re talking from experience. Regarding shared data: I use this more as a way to give raw data to other people and collect results from them. I use it mostly as a temporary directory used to transfer data, anything significant will get copied over to my share and backed up.
I can see how you could be worried about storage quota, luckily I don’t have that many people to worry about. But it is funny you mention it as I could really see someone stashing a few conda environments in there just because they finished their inside quota…
If you’re not that worried about storage then you can just make copies if necessary, then you don’t really have to worry about permissions (apart from read, which is typically default for the same group). But yea if there’s any chance more than 1 person might work off the same copy of data on HPC, make it read only for the peace of mind. Regarding conda envs, yea I have a few common read only conda environments so that scripts can be used by multiple users without the hassle of ensuring everyone has the same env. Quite useful.
The shared environment thing seems like a very cool idea! I’ll try to set it up.
I have a similar need and I am curious whether my current solution is any good:
The data of interest is on a server which can only be accessed with ssh inside the institution. I’ve setup a read-only nfs share to a server which has a webserver (https enabled). There, I set up a temporary webdav share to the read-only nfs mount point and protected with htpasswd, hence external institution members do not have accounts at our institution.
As soon as the transfer is complete I remove all the shares (nfs, webdav).
This is a good idea and something I may setup once we setup our own compute server. However at that point wouldn’t a synced directory be a better fit for the purpose? Such as you define a directory on the external server to be used to share data and every user syncs it to their own share on the main server to get all the shared data through rsync or unison.
Just throwing it out there, I’m not sure if that fits your use case.
Here’s someone that solved this by monitoring the directory using inotifywait, but based on the restrictions you already mentioned I’m assuming you can’t install packages or set up root daemons, correct?
https://bbs.archlinux.org/viewtopic.php?id=280937
Edit: CallMeAI beat me with this exact same answer by 15 minutes.
Maybe some sticky bit https://www.redhat.com/en/blog/suid-sgid-sticky-bit
I thought sticky bits were used to allow other users to edit files but not delete them. Do they also allow inheriting the parent directory permissions?
I didn’t intend and don’t think the stick bit stuff will or could be a complete solution for you. You’ve got some oddly specific and kinda cruddy restrictions that you’ve got to workaround and when they get that nonsensical one ends up solidly in “cruddy hack” territory.
From the article:
group + s (pecial)
Commonly noted as SGID, this special permission has a couple of functions:
If set on a file, it allows the file to be executed as the group that owns the file (similar to SUID) If set on a directory, any files created in the directory will have their group ownership set to that of the directory owner
You could run something like https://pypi.org/project/uploadserver/ in
screenor run a cron every minute that just recursively sets the correct permissions.Wow, that group +s seems exactly what I’m looking for! That actually looks like the clean solution I was looking for. I’ll test it out and report back, I’ll have to wait on Monday for the colleagues to be back in the server, but it seems very promising.
Thank you very much!
Can you check back in here and let us know if it worked?
Wahoo! Best of luck!
A dedicated file sharing application.
What do you mean? Is there an application that allows easily sharing files on one Linux system? That would be nice!
If you mean going through an external server or peer to peer transfer, that is not too feasible. I do not have other storage places with tens of terabytes available, and transfering that much data through some P2P layer, while feasible, would probably be even less user friendly.
Is there an application that allows easily sharing files on one Linux system? That would be nice!
this sentence made me remember this: https://github.com/9001/copyparty
Not what I need, but it looks very cool!
NFS Well if someone is running windows look for samba
s/ftp
scp
You could use pythons build-in http server
rsyncSkip NFS and ftp as they cause more problems than they solve
This is a large computing cluster, there are no such mountpoints available and I’m definitely not allowed to go there and plug a few disks into the racks.
My answer regarding Sftp Scp Rsync remains unchanged. Rclone, Globus, Fuse
None of these programs allow overriding Linux permissions. You can not recline/rsync in another user directory. You can not sftp/scp in another user directory. My problem is not about transferring data across different systems, but rather accessing data on one system through different users. All users should be able to read and modify the files.
You can’t override user permissions. If you could they would be useless
I don’t want to override user permissions, I want it to be easy that users can agree to a directory being shared and in such directory group permissions get overridden by some laxer restrictions through explicit authorization of the user. I’d like for this to exist. This would not make permissions useless, just allow for an easy way to share files across users on a filesystem.
Oh, I see. Then I completely misunderstood. Sorry
I’m pretty sure you can do this by adding default user entries to the directory acl which will then be set on files added to that dir.
Default user entries are in there and do work, however when copying existing files those get masked with the existing group permissions. As such, the only solution I found is to have everyone set their umask to 002 as otherwise we would not get write access to files which are copied and not created in place.
Ah, I see. Well its ugly, but you could inotify to trigger a tiny script to update the perms when files are added or copied to the share dir.
That is a possibility, but how would the setup look like? Only the owner can update the permissions. This would mean that all users need an inotify daemon on that folder for whenever they copy something in there. Not to mention, this is an HPC and we mostly live in login nodes; our sessions are limited to 8 hours which makes setting up such a daemon a bit tricky. Could probably set up somewhere else a cronjob to connect and start it, but it feels a bit cumbersome.
Running the inotify script as a service as root would require only one instance. You could trigger it on close_write and then run setfacl to add ACL entries to the new file for all the share users.
If you can’t add a daemon or service to the system then you can skip inotify and just slam a cron job at it every minute to find new files and update their perms if needed. Ugly but effective.
Another option to consider: You could write a little script that changes umask, copies files, and changes it back. Tell people they must use that “share_cp” script to put files into the share dir.
We can not setup a common group, no way we get root privileges. A cron job would not work either: it is a cluster with many nodes, of which many login nodes. Cron jobs do not work on such systems.
A share_cp script would in fact be a good solution, I may try that and see if people pick it up.
deleted by creator
I would hire someone who knows what they are doing. It sounds like you are out of your element here which is risky.
To answer your question, you have a few options:
-
Samba
- Samba is just a SMB server. If you have a local active directory setup use this.
-
SCP
- this just copies files over a ssh connection
-
Rsync
- this performs a sync of one directory to the other. It can run over SCP
-
Unison
- like rsync but two directional. For it to work it needs to track state
-
Syncthing
- never actually used it but it might be close to what you want.
Thank you for the reply and for trying to help out. You may have misunderstood the question: don’t worry it is ok.
It sounds like you are out of your element here which is risky.
I am ok, it is not risky do not worry. Judging from your answer I’d say I have a bit more experience than you do.
-
deleted by creator








