Hi all,
Ive got a reoccurring issue that has been effecting a couple of dedicated machines that operate remotely. Users are reporting that their systems are unable to turn on. Walking them through the troubleshooting, there is no issues with the monitor. Power is being applied, the system lights turn on. We can confirm the network activity lights on the LAN port and the router has negotiated a link speed. But if we log into the router, we see no packets coming from the PC.
Now, the weird part. When the computer is shipped back, it magically starts working again. Ive tried running a memtest to see if maybe its bad ram, but it always comes back passed. I believe the issue is due to a multiport pci serial card. Sometimes if the cable is removed that connects to that pci, the system will boot. Other times, it completely kills the pc. Until of course it gets shipped back, then it will work again.
Ive really hit a dead end with this issue. Im not sure how to diagnose a no post situation without physically being there. At the very least, I wish I could replicate the issue in person. Unfortunately, the issue is occurring with multiple computers and pci cards. Doubtful its a one off bad piece of hardware.
If anyone has any experience troubleshooting a no post situation remotely or has had issues with pci cards, I’d really appreciate it!
“a couple of dedicated machines”
Is this a new installation? Has this process control system ever been in a state that did not produce the reported discrepancy? What changed and when?
Did you virtualize the site’s legacy system?
multiport pci serial card.
What is plugged into this card?
Is it possible you’re getting some kind of ground loop through a serial cable?
If it’s happening across multiple machines and multiple PCI cards, I’d take a long, hard look at what they’re plugging in to that PCI card.
Now that is an interesting idea. I cant go into to much detail about the exact components but this particular port drives a DC motor. Power is injected to drive the motor.
Tbh, im not very familiar with ground loops. Have you had experiences with ground loops before?
I appreciate the quick response as well!
This particular port drives a DC motor. Power is injected to drive the motor.
On one hand, that does sound suspicious. It’s an external factor and it’s unusual.
But on the other hand, you said that having it not plugged in doesn’t consistently see the thing working, which doesn’t really mesh with what I’d expect if it’s the cause.
You can get optical isolators for serial ports. Those basically run the signal into an LED and then to another LED, which keeps two separate isolated electrical system; those will eliminate ground loops. I haven’t looked into them for serial ports, but I was looking at hooking up a breadboard to USB at one point, and there it seems to be kinda a best practice for USB device development work, to keep any mistakes from damaging a connected computer system.
https://www.amazon.com/rs232-isolator/s?k=rs232+isolator
In your case, it sounds like you are also using the serial port as a power supply, so you might want something that can provide external power:
https://www.amazon.com/External-powered-Repeater-Mini-size-PhotoElectric-Full-line/dp/B00GI9GRMC
It might also be possible to use an isolation transformer, which would be simpler and possibly cheaper. That would, I believe, provide ground isolation without providing protection against more exotic things, like a short in your external device frying a serial controller. I’ve used isolation transformers for coax TV to avoid ground loops, so I imagine that it must be possible to have them handle serial port speeds. But when I search for “RS-232 isolation”, everything I see seems to be opto-electric.
If you can change the serial port being used, maybe also use a different serial port interface, like a USB serial interface.
Ooooo now this is something!! Ive always had a hunch that the way power is being injected into the motor was causing the issue. The serial device is connected by a DB9 port. The RS485 signal A, B and ground then gets accompanied by power to drive the motor. The serial port from the computer goes into this power injecting box and then the box goes to the moto and controllerr.
The grounds should be connected correctly. But now youve got me thinking about these boxes. Modifications have been .are to them overtime, so its totally plausible that something in there is causing a surge to get back into the computer.
Thanks for the response!
RS232? RS485? RS422? Not that I know much difference between them, but just to satisfy my own curiosity…
My experience with ground loops is primarily from audio equipment. The last one I remember was probably 20 years ago. I had two different devices were plugged into two different outlets on opposite ends of the room. One was either a TV or a computer, providing an analog audio signal to a stereo system. Every time I connected the coaxial audio cable, I got a horribly loud 60hz hum. The ground potential of the two systems were slightly different, and the difference was being carried across the shield of the coaxial cable. The amplifier was boosting that difference, and the result was the overpowering hum.
In my case, the workaround was to snip the ground prong on the amplifier and let it float. I generally wouldn’t recommend that.
I kinda doubt you have anything plugged into that port when you’re working on it in your shop. Definitely have them try disconnecting everything plugged into that port and see if the problem goes away.
Are both the motor controller power supply and the computer plugged into the same outlet? If not, get an extension cord and try that.
Power to all components is provided through a dedicated power supply. If the computer was on, I would be able to get direct readout of voltage and amperage. As for the serial ports, some devices use RS232 and some use RS485.
The ground loop certainly seems to be the most plausible cause. And when its in the shop I’ve got exact replicas of the system. I just cant seem to replicate it no matter how hard I try.
For context, Ive stuck a system in a fridge to see if possibly cold temperatures correlate and was still unsuccessful. At this point I wonder if there is interference coming from an outside source and thats why I cant pin it down.
You’ve given me a lot to think about! I greatly appreciate the responses.
Hmm. First, I’m a little fuzzy on the symptoms. To clear that up:
You say “that operate remotely”, which sounds like it’s a headless server, but you also say “there is no issues with the monitor”. Do you mean that they attached a monitor to a video output and that they could see video display?
You have:
We’re unable to access the bios when the computer stops working.
Like, is this via a video output, serial port, or some kind of dedicated hardware management system?
As to troubleshooting, when you say “no packets”, and since it sounds like you have access to and familiarity with the router, you mean not even stuff like ARP? Like, it’s not “I’m not pinging anything”, but “no Ethernet frames have reached the router on that port?”
I’m…a little confused about you saying that the NIC is negotiating speed, because I’d have thought that on typical systems, a NIC wouldn’t be doing that until the OS is up and tells the NIC to become active. But it sounds from your systems like whatever is happening is happening pretty early in the boot process, before the OS or even bootloader is doing anything. Do you have the BIOS set up to make use of the NIC in some way, like using DHCP or BootP or something to do network booting?
EDIT: Or wake-on-LAN?
EDIT2: Because if you do, and don’t actually need wake-on-LAN or network booting functionality, I’d think that I’d try turning it off to see whether the issue might vanish.
Should have been more clear about the remote part. The systems operate remotely from me, I can access them via the internet. The users need to use the screen to operate it. This is just a windows 11 computer after all.
As for the router information, thats part of the reason I know there is power to the entire thing. I can access the router page and see that the link speed was negotiated correctly, at 1gb per second. If they unplug the ethernet, the port reads disconnected.
But 0 packets are getting sent from the computer. I believe those are ARP, but the router page doesnt define it. Just has a table with packets in and packets out. Packets going in will usually have a couple from the router. Always 0 with packets coming out. Theres actually a ping function built into the router, and that doesnt respond at all.
BIOS is not setup during boot to use any sort of networking. No PXE boot or anything like that. On this particular motherboard, you have to enable the network stack, so I dont think its that. System is setup to always turn on when power is applied to bring all the other components online together.
Anyway, thanks for taking time out of your day to respond!
Should have been more clear about the remote part. The systems operate remotely from me, I can access them via the internet. The users need to use the screen to operate it. This is just a windows 11 computer after all.
Ah, okay. So then they aren’t getting any video display from the BIOS when the problem comes up. Okay, yeah, then that’s pretty convincing that it’s early in the boot process. So, yeah, OS probably isn’t a factor.
But 0 packets are getting sent from the computer. I believe those are ARP, but the router page doesnt define it. Just has a table with packets in and packets out. Packets going in will usually have a couple from the router. Always 0 with packets coming out. Theres actually a ping function built into the router, and that doesnt respond at all.
Okay. So, strictly-speaking, “frames” are what one calls things at the Ethernet level, and “packets” at the IP level, so if one assumes that the router page is actually being technically-correct, it’s possible that the computer is still doing things at the Ethernet level. But, yeah, gotcha.
considers
Well, let’s see. This is kinda more at the brainstorming level. You said that you stuck the thing in a fridge, so I’m assuming that the user is at a colder-than-normal situation, not warmer. I guess you’ve got:
Temperature. You said that you tried that. If it’s colder than normal specifically around the time that the problem shows up, you might also consider humidity – if it’s high humidity, then condensation inside electrical devices can be a problem. Like, especially if this thing is outside and you’re getting (electrically-conductive) dew forming on surfaces in the morning or something like that, that can wreak havoc on electrical devices. I don’t know what the best way to diagnose that would be. A hygrometer will tell you the relative humidity. If it’s in the open, maybe leave a small space heater aimed at the system, which should produce lower-humidity air where the air is warmer than the surrounding air, and see if the issue goes away. If it’s in an enclosed space, maybe run a dehumidifier.
You’ve got the possibility of problems coming in from your external electrical lines — you have at least serial, power, and networking going into that machine. You might try, for troubleshooting purposes, if it arises again, having the user pull all of the lines connected to external stuff other than power, including the Ethernet cable and that serial thing, and seeing if it becomes impossible to reproduce then.
You said that some motor was involved. If you’re talking some kind of industrial setup with other machinery around, I guess something could theoretically be emitting some kind of strong electrical field that creates problems. I’ve never heard of a situation where a PC won’t work because of that, but I’d imagine that it’s possible.
I once recall hearing about a situation at a company I was working at where our support people had problems with vibration in a customer’s environment affecting the device — they couldn’t reproduce the problem back at the company. That took them some time to work out.
Some of that’s pretty exotic, but if you’re just looking for potential leads to consider, that’s all that immediately comes to mind.
Oh wow this is a fantastic write up! I really appreciate the effort you put into this. You’ve certainly given me lots to ponder over. I wonder if there is some credence to the humidity issue. Id have to do some digging on when the issue occured and what the conditions were like but some of the locations do get humid.
It seems like the problematic variable is likely to be external to the PCs themselves. My inclination would be to have the end users disconnect everything but the monitor to see if the systems POST, and then reintroduce peripherals one by one.
Yeah, unfortunately thats how I started to pin it down to that pci multiport serial card. Only sometimes if you unplug that cable does the system turn back on. Oddly, removing the cable doesnt always solve the problem. Sometimes the system will boot right back up and everything works just fine. Other times it will turn on, but windows will throw a driver error on the serial card.
The common factor tho, is that when they computer is shipped back, it will start working again. Very frustrating haha I appreciate the response as well!
It’s an issue with the power plug or outlet on site, not the PC or network. At least that’s my best guess. The magical healing when shipped back is a strong indicator. Issue like that drove me crazy once with a PC that wouldn’t get an IP address from the router until I plugged the router into a different outlet.
The problem could also come from the ethernet side but that would be easy to check by having on site staff boot the PC with the ethernet cable unplugged.
Oh and we recently had a similar issue with Lenovo laptops that would just sit there with a black screen after turning on and not reaching the windows bootloader, that was a compatibility issue with the USB C docking station.
sit there with a black screen after turning on and not reaching the windows bootloader
Were you able to access the BIOS? We’re unable to access the bios when the computer stops working.
Also, even more frustering, we have a dedicated power supply to run the whole operation. If the system was able to turn on, I could read the voltages and amps going into the system. Weve swapped out the power supplies as well but that doesnt seem to resolve the issue.
Is it localized to a room? Or a subnet? A router? A PC model? A time frame?
With such issues, I like to apply the scientific method, take everything into account no matter how strange, and try to find a way to isolate it.I’ve had monitor issues that were caused by an office chair before.
Of course that’s hard to do remotely.
Oh that sounds like a nightmare lol how did you even realize it was the office chair?
The ticket said “Every time I sit down at my desk, the screen turns off.”
So I went on site and told the user to demonstrate the issue.
She sat down on her chair and the screen shut off, then turned on again a couple seconds later.
It was a combination of static electricity and the telescoping chair stem, which induced a magnetic field that disturbed the monitor.
No BIOS access, no. But after a hard poweroff and disconnecting from the dock they booted fine. We connected them to the dock after logging in to Windows and then a firmware update of all components fixed it.
Where are you in the world, and where are your users?
I dont mean to be rude but I dont think thats relevant without doxxing myself. You can look at my post history to know that I’m in the US and my computers are as well. They are far enough that I cannot travel to them.
Power requirements are different around the world. It very well could be relevant. You mentioned a DC motor, which has me wondering if it’s in an industrial setting, and if that setting might have some unusual variety of 3-phase power rather than split-phase 120/240 typical of North America.
Well it matters for what you’re describing. I have a good idea of what your problem is and where you are. Just asking for confirmation. It’s not a POST issue.
Okay?




