Facebook側の大きな進歩は、新しいトレーニング環境「Habitat 2.0」と、それを可能にするために作成されたデータセットの2つだ。数年前に公開されたHabitatを覚えているだろうか。当時Facebookは「身体性を有するAI」と呼ばれる現実世界と相互作用するAIモデルを開発するために、AIが動き回ることのできる、写真のようにリアルに描写された仮想環境を大量に構築していた。



しかし結局のところ、バーチャル環境にはポリゴンの厚みしかなく、相互作用は最小限で、物理的にリアルなシミュレーションもない。ロボットがテーブルにぶつかっても、倒れて物をそこら中にこぼしたりすることもなく、台所で冷蔵庫を開けたり、流し台から物を持ち上げたりすることもできなかった。今回、Habitat 2.0および新しいReplicaCADのデータセットは、相互作用性を改善し、単に3Dの表面をなぞるのではなく、3Dオブジェクトを増やすことで、この問題を解決する。

シミュレートされたロボットは、これまでと同様に新しい住戸単位の環境を動き回るが、あるオブジェクトに到達すると、そのオブジェクトに対して実際にアクションすることができる。あるロボットのタスクが「ダイニングテーブルからフォークを拾ってシンクに置く」ことだと考えてみよう。数年前までは、フォークを適切にシミュレートできなかったので、フォークを持ち上げたり下ろしたりすることだけが想定されていた。Habitat 2.0では、フォークが置かれているテーブルやシンクなどが物理的にシミュレートされる。そのため、計算量は増加するが、格段に有意義なシミュレーションとなる。

この分野は急速に進歩しており、新しいシステムが登場するたびに新機能が追加され、同時に次の大きな改善点やチャンスが示される。この場合、Habitat 2.0の最も近い競争相手は、住戸単位の環境と物理的なオブジェクトのシミュレーションを組み合わせたAI2(エーアイツー)の「ManipulaTHOR」だろう。

Habitat 2.0はスピードでManipulaTHORに勝る。論文によると、シミュレーターの実行速度はManipulaTHORと比較しておよそ50~100倍。つまり、ロボットは1秒あたり50~100倍多くのトレーニングを実行することができる(これは厳密な比較ではなく、2つのシステムは他にも異なる点がある)。





CEOのRJ Pittman(アールジェー・ピットマン)氏は次のように話す。「私たちは、現存するあらゆる種類の物理的構造物、構造物のようなものに対し、3Dデータを作成しました。住宅、高層ビル、病院、オフィス空間、クルーズ船、ジェット機、マクドナルド……。デジタルツインに含まれる情報すべてが、研究には非常に重要です」「これらの3Dデータはコンピュータビジョン、ロボット工学、家庭内オブジェクトの識別など、あらゆる分野に影響すると確信していました。Facebookに事細かに説明する必要はありませんでした。Habitatや身体性を有するAIにとっては、あきらかに重要なデータだからです」。




ピットマン氏は「(HM3Dは)極めて多様性の大きなデータセットです」「 私たちは、さまざまな実世界の環境を、豊富に、確実に集めたいと考えていました。AIやロボットのトレーニングで最大限の効果を得るには、このような多様なデータが必要なのです」と話す。







To train a robot to navigate a house, you either need to give it a lot of real time in a lot of real houses, or a lot of virtual time in a lot of virtual houses. The latter is definitely the better option, and Facebook and Matterport are working together to make thousands of virtual, interactive digital twins of real spaces available for researchers and their voracious young AIs.

On Facebook’s side the big advance is in two parts: the new Habitat 2.0 training environment and the dataset they created to enable it. You may remember Habitat from a couple years back; in the pursuit of what it calls “embodied AI,” which is to say AI models that interact with the real world, Facebook assembled a number of passably photorealistic virtual environments for them to navigate.

Many robots and AIs have learned things like movement and object recognition in idealized, unrealistic spaces that resemble games more than reality. A real-world living room is a very different thing from a reconstructed one. By learning to move about in something that looks like reality, an AI’s knowledge will transfer more readily to real-world applications like home robotics.

But ultimately these environments were only polygon-deep, with minimal interaction and no real physical simulation — if a robot bumps into a table, it doesn’t fall over and spill items everywhere. The robot could go to the kitchen, but it couldn’t open the fridge or pull something out of the sink. Habitat 2.0 and the new ReplicaCAD dataset change that with increased interactivity and 3D objects instead of simply interpreted 3D surfaces.

Simulated robots in these new apartment-scale environments can roll around like before, but when they arrive at an object, they can actually do something with it. For instance if a robot’s task is to pick up a fork from the dining room table and go place it in the sink, a couple years ago picking up and putting down the fork would just be assumed, since you couldn’t actually simulate it effectively. In the new Habitat system the fork is physically simulated, as is the table it’s on, the sink it’s going to, and so on. That makes it more computationally intense, but also way more useful.

They’re not the first to get to this stage by a long shot, but the whole field is moving along at a rapid clip and each time a new system comes out it leapfrogs the others in some ways and points at the next big bottleneck or opportunity. In this case Habitat 2.0’s nearest competition is probably AI2’s ManipulaTHOR, which combines room-scale environments with physical object simulation.

Where Habitat has it beat is in speed: according to the paper describing it, the simulator can run roughly 50-100 times faster, which means a robot can get that much more training done per second of computation. (The comparisons aren’t exact by any means and the systems are distinct in other ways.)

The dataset used for it is called ReplicaCAD, and it’s essentially the original room-level scans recreated with custom 3D models. This is a painstaking manual process, Facebook admitted, and they’re looking into ways of scaling it, but it provides a very useful end product.

The original scanned room, above, and ReplicaCAD 3D recreation, below.

More detail and more types of physical simulation are on the roadmap — basic objects, movements, and robotic presences are supported, but fidelity had to give way for speed at this stage.

Matterport is also making some big moves in partnership with Facebook. After making a huge platform expansion over the last couple years, the company has assembled an enormous collection of 3D-scanned buildings. Though it has worked with researchers before, the company decided it was time to make a larger part of its trove available to the community.

“We’ve Matterported every type of physical structure in existence, or close to it. Homes, high-rises, hospitals, office spaces, cruise ships, jets, Taco Bells, McDonalds… and all the info that is contained in a digital twin is very important to research,” CEO RJ Pittman told me. “We thought for sure this would have implications for everything from doing computer vision to robotics to identifying household objects. Facebook didn’t need any convincing… for Habitat and embodied AI it is right down the center of the fairway.”

To that end it created a dataset, HM3D, of a thousand meticulously 3D-captured interiors, from the home scans that real estate browsers may recognize to businesses and public spaces. It’s the largest such collection that has been made widely available.

Image Credits: Matterport

The environments, which are scanned an interpreted by an AI trained on precise digital twins, are dimensionally accurate to the point where, for example, exact numbers for window surface area or total closet volume can be calculated. It’s a helpfully realistic playground for AI models, and while the resulting dataset isn’t interactive (yet) it is very reflective of the real world in all its variance. (It’s distinct from the Facebook interactive dataset but could form the basis for an expansion.)

“It is specifically a diversified dataset,” said Pittman. “We wanted to be sure we had a rich grouping of different real world environments — you need that diversity of data if you want to get the most mileage out of it training an AI or robot.”

All the data was volunteered by the owners of the spaces, so don’t worry that it’s been sucked up unethically by some small print. Ultimately, Pittman explained, the company wants to create a larger, more parameterized dataset that can be accessed by API — realistic virtual spaces as a service, basically.

“Maybe you’re building a hospitality robot, for bed and breakfasts of a certain style in the U.S — wouldn’t it be great to be able to get a thousand of those?” he mused. “We want to see how far we can push advancements with this first dataset, get those learnings, then continue to work with the research community and our own developers and go from there. This is an important launching point for us.”

Both datasets will be open and available for researchers everywhere to use.

(文:Devin Coldewey、翻訳:Dragonfly)

