iPhones are NOT better tracking hardware
If you ask in almost any VTuber community what to get for tracking, someone will recommend an iPhone. Sure, iPhones work great for face tracking! And almost always, they will go ahead and explain something like this:
> You see, iPhones don't just have a webcam, they have a special depth camera that sees in 3D and that's why not even the world's best camera can compete. The only way to get the best face tracking is an iPhone.
I want you to do an experiment. Grab your iPhone, open up VTube Studio (or your favorite tracking app), and pull up the camera/ARKit mask preview. Go ahead, move around, make sure it's tracking well. Now grab a thin item like a pen, and sweep it around the top of your phone. Notice where it covers up the picture in the camera preview? That's the front selfie camera. A regular old webcam. Notice something? When you cover up the selfie cam, and only the selfie cam, the tracking stops working.
Now grab some opaque tape (or sticky notes). Cut out two pieces, and place them to the left and right of the selfie cam you just found, so only the camera can see through, right at the edge of its field of view. If you're paranoid, do the top and bottom too.
You have just blocked out the Face ID/TrueDepth camera system (actually a bunch of things: the IR camera, the flood illuminator, and the dot projector, which are the three independent parts that make up the 3D scanning system). Go ahead, try to unlock your phone with Face ID. You can't.
Now try VTube Studio again.
It still tracks. Practically indistinguishably from before.
It's not the magical camera. Apple just have really good webcam tracking software built into every iPhone. That's it. That's all it is! Any other phone or device COULD be just as good... if someone like Google stepped up their face tracking ML model game to match Apple's. The only hardware you need is a high quality camera (and on a mobile device, probably a neural accelerator, but on a PC the GPU would be more than enough).
(Obviously Apple have good cameras too, that does play a role and it's why the tracking works well in low light too. No, you don't need the 3D camera for low light tracking either, try it!)
I actually looked into the Apple code. Behind the scenes it's called FaceKit and it uses a machine learning model called CaraNet. There are two versions, one for a pure RGB feed (no depth), and one for RGBD (with depth). I don't know if the RGBD one is used at all with VTube Studio, but if it is, I believe the only thing it really does in practice is give slightly improved distance information for the face. This is important for AR applications, but it doesn't matter for VTubing (the exact distance to your face doesn't matter, even if you use the Z parameter for model size or whatever it doesn't matter if it's physically accurate in meters or not since the model isn't in AR and doesn't have to interact with real objects).
If you can manage to show a difference in the ARKit blend shape data with and without the depth camera covered, please record it and share a video. I haven't been able to.
The other question is... Apple, why not release ARKit/FaceKit for macOS? 😇
Edit: This all gets pretty confusing when you start talking about different apps and setups like VBridger/etc. Those apps can change the result but they all use the same source data that plain old VTS does, coming from ARKit/FaceKit. Some setups work better than others, but that still has nothing to do with iPhone hardware, and it could still be replicated with a webcam if good enough software existed. Feel free to do the above experiment with your favorite iPhone tracking setup!
Edit 2: All the downvotes and disagreement but nobody is doing the test and showing me how I'm wrong... come on people, this is something you can easily test yourself! I'm not telling you to take my word for it, I gave you step by step instructions. This is how we learn and improve our understanding of the world, by doing experiments, and that works for mysterious technology sold by Apple too!
The reason I want to dispel this myth is that I don't want people trying out/reviewing and especially people considering developing webcam tracking to have a preconceived notion that it has to be worse than iPhone. It really doesn't. It might be today, but it doesn't have to be.