I’ve spent time at various jobs “pairing” with another developer in a different location.
Sometimes I think I must have tried every different piece of software ever developed for this purpose. I have not been completely satisfied with any of them.
The number one problem is always lag. Even on a voice-only call, it’s hard to sustain a conversation when the lag spikes into the hundreds of milliseconds. Most audio- or video-conferencing services I’ve used barely manage to maintain an acceptable level of latency, and none of them do it without hiccups. Some services are much worse, having multiple seconds of delay. (If you’ve ever been in a group call with two or more participants in the same room, you can hear the lag as an “echo.”)
This obviously isn’t an issue of bandwidth, because the problem has persisted my entire career while my network bandwidth has increased from about 1.5 mbps to over 100 mbps.
Obviously, latency is a hard problem, much harder than bandwidth. Jitter is even harder. I don’t know if this will ever be solved to my satisfaction, because there are two many disconnected interests (network operators, hardware manufacturers, kernel developers) who would have to collaborate to make a meaningful difference.
However, there are other limitations which I believe could be solved within a single product.
When pairing, I want to be able to see three things at the same time:
- “My” screen
- “Their” screen
- Video
If I’m the one “driving,” i.e. controlling the keyboard and mouse, then I want “my” screen in front of me and my partner’s screen to the side. If my partner is “driving,” then I want “their” screen in front and “my” screen off to the side. In either case, both of us should be able to see both screens, and also see one another’s faces. If others are observing then we should be able to see their faces too.
All three screens should be roughly the same size and aspect ratio at both ends. Otherwise you get scaling problems: text that is unreadably small at one end is awkwardly large at the other.
I believe this would make pairing more efficient. One person can “drive” while the other is looking up documentation, checking test results, or watching a metrics dashboard. We can still see one another, which helps with communication. We still have a text-chat interface for sending short snippets without switching away from the primary interface.
One participant may “remote control” the other’s screen for brief periods, but latency and different local configurations make this impractical for serious work. The default mode of interaction with the remote screen should be to “point” or gesture to something on the remote screen in a way that everyone can see (some tools already do this).
I am not aware of any combination of hardware or software which can provide this triple-screen mode. Most video-conferencing software does not even support screen-sharing and video at the same time, and I have never seen one that allows two participants to screen-share simultaneously.
There’s another aspect to this setup which might be harder to realize: Ideally, my partner and I are working in the same “environment,” i.e. files in a repository. Changes I make on “my” screen should be reflected immediately in my partner’s screen, even if we are using different editors/IDEs. There are a few products that claim to do this, such as Floobits, but I have not explored them much.