The two most difficult things I would say are using Xseg to do your mask training properly, and actually learning what will work and what won't with your source material. An example being, if all of your source pics are at a specific angle, and the video you're trying to place it into is at a different angle, no matter how much training you do you'll never get a good result.
So easy to learn, harder to master. (like anything I guess)
So easy to learn, harder to master. (like anything I guess)