While I agree, I'm not quite sure this would happen. If it did, it would most definitely revolutionize the MMO world. Not only would it involve everyone to have superior internet connection speeds to keep up with the server, but it could also present some exploitive situations, depending on how it was carried out. It would have to involve mutual cooperation from both players, obviously.
Obviously. Any co-emotes would have to be mutually agreed upon, thus preventing the exploitive situations.
I think that your concern about internet connection speeds is unfounded though, not sure why you feel that way.
The actual technical challenge would be collision detection on such a wide variety of avatars. This technical issue would prevent the same, say, hug animation being used between a Klingon-to-Ferenghi Hug and a human-to-human hug, due to size difference and the inevitable compensation. Seeing as hugging isn't really a trek thing anyway, limiting these to handshakes, where all that needs to be compensated for is arm position, becomes a much more plausible goal. As implied, looking to the Matrix Online's Interlock system for cues on this issue might be wise.
As for custom emotes, definitely... this is something that I've been doing since chat rooms came out in the late 90s. But it highlights another concern I attempted to touch on... Emotes, at least a times, should have some sort of gameplay purpose... either NPCs react to them, they trigger certain things in certain missions or something, so that the time taken to learn and utilize them isn't deemed a complete and utter waste of time. Adding multi-person emotes and, as thus, 'gameplay' to emotes is one solution, but there are other ways.