Postavke privatnosti

Robot, make me a chair: how MIT combines generative AI and robots for modular furniture

MIT is developing systems that generate 3D models from simple text and speech instructions and convert them into physical furniture. Generative AI and robotic assembly together accelerate design, reduce waste, and pave the way for local, sustainable production of adaptable objects.

Robot, make me a chair: how MIT combines generative AI and robots for modular furniture
Photo by: Domagoj Skledar - illustration/ arhiva (vlastita)

To lean back in an armchair and utter a simple sentence like: "Make me a chair," and then watch a robotic arm assemble a physical object in front of you in a few minutes – until recently, this sounded like a scene from science fiction. In December 2025, this is a reality in the laboratories of the Massachusetts Institute of Technology (MIT), where researchers are combining generative artificial intelligence, computer vision systems, and robotic assembly into a unique, fully automated design-and-manufacturing process.


Instead of classic computer-aided design (CAD), which requires expert skills, hours of modeling, and detailed knowledge of software, the new system based on artificial intelligence allows a complex, multi-component object to be described in ordinary language. Generative artificial intelligence models create a three-dimensional representation of the desired object from text, and then a vision-language model (VLM) decomposes that geometry into standardized physical parts that the robot can immediately begin assembling.


This is research work that demonstrates how the gap between digital design and physical production can be dramatically reduced. The same team has gone a step further in recent months: based on the same principles, a "speech-to-reality" system has been developed that no longer even requires typing text – it is enough to speak the order, and modular furniture and other objects are created in just a few minutes.


Why classic CAD became a design bottleneck


Computer-aided design tools remain the standard in industry, from automotive and aerospace to construction and consumer electronics. However, the same tools, which are powerful and precise, simultaneously represent a barrier to anyone lacking specialist knowledge. The learning curve is steep, interfaces are complex, and detailed control over every screw or surface is often excessive in the early stages of a project, when it is most important for the user to quickly try out multiple ideas and see them in physical space.


Generative AI has shown in the last few years that it can create images, 3D models, and entire virtual scenes from short text. But most of these digital objects remain trapped in the virtual world. The geometry created by the models is often irregular, lacks a clear component structure, and does not take into account the constraints of physical production. In other words, what looks good on screen does not necessarily mean it can be easily, quickly, and cheaply assembled in reality.


MIT's approach sets a new standard precisely here: the goal is not just to generate a beautiful digital model, but to bring it into a form suitable for automatic assembly from prefabricated elements. Thereby, generative AI ceases to be a tool for inspiration and becomes part of an actual production line.


From text to 3D model: how the system "understands" geometry and function


The work starts from a simple interaction: the user types a request into a text interface – for example, "make me a chair" or "I need a shelf with three levels." A generative 3D model creates a mesh representation of the object based on that description. This mesh describes the surface and volume of the future object, but still says nothing about which physical parts it will consist of and how they will be connected.


In the next step, the role is taken over by a vision-language model, a type of generative AI system trained on a large amount of images, textual descriptions, and scene understanding tasks. Its task is to "look" at the three-dimensional model and infer the functional units of the object: where the seat is, where the backrest is, where the legs are, which correspond to surfaces on which the human body will lean, and which are elements that primarily bear the structural load.


The researchers work with two basic groups of prefabricated components: structural elements that form the skeleton of the object and panel elements that form flat surfaces like seats or shelves. The vision-language model must decide, based on geometry and function, where which type of component is used. Thus, for example, it recognizes that the seat and backrest of a chair need panels, while the legs and cross-braces remain executed in structural segments.


What makes this approach particularly interesting is the fact that the model does not rely on manually programmed rules for a chair, shelf, or table. Instead, it uses knowledge acquired during training on many images and descriptions of objects to generalize to new shapes generated by AI. Because of this, the same system, without additional training, can work with different types of furniture and other functional items.


Component assignment and preparation for robotic assembly


After the vision-language model builds an understanding of function, the system moves to the practical level: for every surface on the 3D mesh, it assigns tags defining whether a panel element should be installed there or not. Surfaces are numbered, and component assignments are fed back into the model to further align with the geometry and physical assembly constraints.


The result is a structured model in which every part of the object is linked to one of the predefined types of prefabs. This is the crucial step that allows digital design to be translated into a concrete set of instructions for the robotic arm: how many elements are needed, where they are placed, in what order they are connected, and how collisions are avoided during assembly.


The robotic system then takes over the prepared plan and begins assembling the object on the work surface. Since all parts are standardized and reusable, the process is fast and very clean: no sawdust, no waiting time for glue to dry, no waste ending up in the trash. When the user no longer needs that piece of furniture, it can be disassembled and something completely new assembled from the same parts.


Human-robot co-authorship: the user stays in the loop


Although the system automates a large part of the process, the researchers emphasized the importance of the human remaining a creative partner. After the initial design proposal, the user can give additional instructions in natural language: for example, ask for panels only on the backrest and not on the seat, for the chair to be lower or higher, for the shelf to have more levels, or for the emphasis to be on visual airiness instead of a full surface.


Every such modification reactivates the generative model and the vision-language module, which reconcile the new description with the existing 3D model and component structure. In this way, an iterative creative cycle is created: the system proposes solutions, the user guides and corrects them, and the robot turns them into physical prototypes. Instead of dealing with precise coordinates and parameters, the human thinks about function, aesthetics, and usage scenarios.


Such a "human-in-the-loop" approach also has an important psychological dimension. Participants in user studies often highlighted a sense of co-authorship over objects that were formally assembled by a robotic arm: they perceived the final result as "their" chair or shelf precisely because they shaped it through conversation with the system, and not through clicking on a complex CAD interface.


User testing results: preference for AI design


To quantitatively assess the value of their approach, the researchers conducted a study in which participants rated different versions of the same objects. One group of designs was created with the help of their AI-powered system with a vision-language model, another was generated by an algorithm that mechanically places panels on all upward-facing horizontal surfaces, while the third was the result of a random arrangement of panels.


More than ninety percent of participants preferred objects created by the system combining generative AI and VLM compared to alternative approaches. They particularly highlighted the logical arrangement of surfaces for sitting or storage, the sense of structural stability, and the visual harmony of the whole. The random arrangement of panels was perceived as chaotic, and the purely geometric rule "cover all horizontal planes with panels" proved too crude to satisfy real user needs.


The assembly process also proved time-efficient. Thanks to standardized structural modules and panels, the robot could assemble a whole range of different configurations in a short time – from simple chairs and stools, through shelves, to more complex pieces of furniture that in classic production would require the creation of special tools or molds.


From text to speech: "speech-to-reality" as the logical next step


Based on experiences gained working with textual descriptions, the team expanded the concept to speech as well. The new "speech-to-reality" system removes even the last technological barrier for inexperienced users: it is no longer necessary to even think up short written instructions; it is enough to say in the room that you want a simple chair, a bookshelf, or a small side table.


The speech signal first goes through standard processing and is converted into text, after which the same generative AI infrastructure takes over: the model generates a 3D shape, the system decomposes it into modular components, and the planner determines the optimal order and method of assembly. The result is closely related to earlier work on text, but the user experience is even more natural – communication with the robot is increasingly similar to a conversation with a human carpenter or designer.


Instead of two types of prefabs, "speech-to-reality" in its first implementation relies on a network of identical cubic modules that the robot stacks into a lattice structure. Such a voxel-based approach facilitates the discretization of complex geometry: whether it is a chair, a shelf, a small table, or a decorative dog, the object can be broken down into a combination of cubes that the robot easily grabs, positions, and connects.


Experiments in the laboratory showed that the system can craft simpler pieces of furniture in just a few minutes that are strong enough for daily use in prototype conditions. Researchers are working in parallel on improving the method of connecting modules so that the construction can withstand greater loads; they plan to replace magnetic connections, which are practical for rapid assembly, with more robust mechanical joints.


Sustainability, local production, and potential for industry


One of the key motives behind this research is the question of sustainability. Today's furniture is mostly produced in centralized factories and then transported over long distances. Every design change means a new production run, new tools, and additional logistics costs. Systems combining generative AI, modular components, and robotic assembly offer a radically different scenario: design and production can take place locally, almost on demand.


Instead of ordering a finished product, the user could in the future order "recipes" for objects – parametric descriptions and a set of rules that then trigger a local robotic system. One set of standardized modules could be reused for completely different configurations of furniture, exhibition displays, temporary building structures, or laboratory experiments. When needs change, objects are disassembled, and the material returns to the cycle.


For industry, especially for areas like aerospace or advanced architecture, such systems mean the possibility of rapid physical prototyping of complex geometries that are difficult to assemble manually. Researchers emphasize that the same computational environment can be connected to multiple robotic cells, paving the way for scaling from a desktop robotic arm to entire factories where the boundary between design studio and production hall is increasingly less visible.


Technical limits and open research questions


Although the results seem impressive, the system still has clear limitations. Generative models sometimes produce geometries that are very sculptural but difficult to translate into a modular structure without compromise. The vision-language model does not understand physics at an engineer's level; its "intuition" about what is stable and what is not stems from data statistics, not from solid mechanical calculations.


Therefore, researchers are exploring how to include additional simulations and checks in the process: from detecting potentially unstable joints and excessive spans without support, to optimizing the number of components used to reduce mass and assembly time. Long-term, the goal is for the AI system not only to formally satisfy the user description but also to quantitatively optimize strength, durability, and material consumption.


Another open question concerns the diversity of components. Work on text-driven robotic assembly is focused on two types of parts, while "speech-to-reality" uses uniform voxel modules. In practice, many objects will require other elements: hinges, sliding guides, wheels, springs, or flexible joints. Including such components means even more complex assembly planning, but it opens the path towards fully functional items like cabinets with doors, height adjustment mechanisms, or even simpler robots designed by another AI.


Democratization of design: what "say it and it appears" means


In the background of these experiments lies a broader social vision. If anyone can describe with words what they need and see how it arises in the physical world in a few minutes, then the boundary between user and designer blurs dramatically. Just as earlier waves of digitization enabled everyone to be a publisher, musician, or photographer, generative AI combined with robotics could extend that principle to the world of objects.


For education, this means new ways of learning: students could experiment with constructions and shapes without fear of making mistakes when cutting material or using tools. For architects and industrial designers, it is about the possibility of testing ideas for interiors, prototypes, or exhibition installations in full scale practically in real time. For end users, a scenario where you have a compact robotic system in the living room that assembles and disassembles furniture according to current needs no longer looks so far away.


Researchers, however, emphasize that this is only the first step. The systems described in the papers are still laboratory prototypes, with a limited set of modules, a controlled environment, and carefully defined tasks. But the direction of development is clear: by combining advanced AI models that understand geometry and function with physical robots that can reliably handle standardized components, a new type of "conversational" or "textual" manufacturing plant is emerging.


From early CAD systems in the seventies to contemporary generative networks and vision-language models, decades of evolution of tools for creating objects extend. The latest MIT experiments suggest the next leap: a future where "Robot, make me a chair" will be as common a sentence as "send me an email," and manufacturing processes as adaptable and fast as today's software development.

Find accommodation nearby

Creation time: 2 hours ago

Science & tech desk

Our Science and Technology Editorial Desk was born from a long-standing passion for exploring, interpreting, and bringing complex topics closer to everyday readers. It is written by employees and volunteers who have followed the development of science and technological innovation for decades, from laboratory discoveries to solutions that change daily life. Although we write in the plural, every article is authored by a real person with extensive editorial and journalistic experience, and deep respect for facts and verifiable information.

Our editorial team bases its work on the belief that science is strongest when it is accessible to everyone. That is why we strive for clarity, precision, and readability, without oversimplifying in a way that would compromise the quality of the content. We often spend hours studying research papers, technical documents, and expert sources in order to present each topic in a way that will interest rather than burden the reader. In every article, we aim to connect scientific insights with real life, showing how ideas from research centres, universities, and technology labs shape the world around us.

Our long experience in journalism allows us to recognize what is truly important for the reader, whether it is progress in artificial intelligence, medical breakthroughs, energy solutions, space missions, or devices that enter our everyday lives before we even imagine their possibilities. Our view of technology is not purely technical; we are also interested in the human stories behind major advances – researchers who spend years completing projects, engineers who turn ideas into functional systems, and visionaries who push the boundaries of what is possible.

A strong sense of responsibility guides our work as well. We want readers to trust the information we provide, so we verify sources, compare data, and avoid rushing to publish when something is not fully clear. Trust is built more slowly than news is written, but we believe that only such journalism has lasting value.

To us, technology is more than devices, and science is more than theory. These are fields that drive progress, shape society, and create new opportunities for everyone who wants to understand how the world works today and where it is heading tomorrow. That is why we approach every topic with seriousness but also with curiosity, because curiosity opens the door to the best stories.

Our mission is to bring readers closer to a world that is changing faster than ever before, with the conviction that quality journalism can be a bridge between experts, innovators, and all those who want to understand what happens behind the headlines. In this we see our true task: to transform the complex into the understandable, the distant into the familiar, and the unknown into the inspiring.

NOTE FOR OUR READERS
Karlobag.eu provides news, analyses and information on global events and topics of interest to readers worldwide. All published information is for informational purposes only.
We emphasize that we are not experts in scientific, medical, financial or legal fields. Therefore, before making any decisions based on the information from our portal, we recommend that you consult with qualified experts.
Karlobag.eu may contain links to external third-party sites, including affiliate links and sponsored content. If you purchase a product or service through these links, we may earn a commission. We have no control over the content or policies of these sites and assume no responsibility for their accuracy, availability or any transactions conducted through them.
If we publish information about events or ticket sales, please note that we do not sell tickets either directly or via intermediaries. Our portal solely informs readers about events and purchasing opportunities through external sales platforms. We connect readers with partners offering ticket sales services, but do not guarantee their availability, prices or purchase conditions. All ticket information is obtained from third parties and may be subject to change without prior notice. We recommend that you thoroughly check the sales conditions with the selected partner before any purchase, as the Karlobag.eu portal does not assume responsibility for transactions or ticket sale conditions.
All information on our portal is subject to change without prior notice. By using this portal, you agree to read the content at your own risk.