Song-Chun Zhu ( zh, s=朱松纯) is a Chinese computer scientist and applied mathematician known for his work in

computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...

, cognitive

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...

and

robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrat ...

. Zhu currently works at Peking University and was previously a professor in the Departments of Statistics and Computer Science at the

University of California, Los Angeles The University of California, Los Angeles (UCLA) is a public land-grant research university in Los Angeles, California. UCLA's academic roots were established in 1881 as a teachers college then known as the southern branch of the California St ...

. Zhu also previously served as Director of the UCLA Center for Vision, Cognition, Learning and Autonomy (VCLA). In 2005, Zhu founded the Lotus Hill Institute, an independent non-profit organization to promote international collaboration within the fields of

and pattern recognition. Zhu has published extensively and lectured globally on artificial intelligence, and in 2011, he became an IEEE Fellow (

Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...

) for "contributions to statistical modeling, learning and inference in computer vision." Zhu has two daughters, Stephanie and Yi. Zhu Yi ( zh, c=朱易) is a competitive

figure skater Figure skating is a sport in which individuals, pairs, or groups perform on figure skates on ice. It was the first winter sport to be included in the Olympic Games, when contested at the 1908 Olympics in London. The Olympic disciplines are me ...

Early life and education

Born and raised in Ezhou, China, Zhu found inspiration, when he was young, in the development of computers playing chess, sparking his interest in artificial intelligence. In 1991, Zhu earned his B.S. in Computer Science from the

University of Science and Technology of China A university () is an educational institution, institution of higher education, higher (or Tertiary education, tertiary) education and research which awards academic degrees in several Discipline (academia), academic disciplines. Universities ty ...

Hefei Hefei (; ) is the capital and largest city of Anhui Province, People's Republic of China. A prefecture-level city, it is the political, economic, and cultural center of Anhui. Its population was 9,369,881 as of the 2020 census and its built-up ( ...

. During his undergraduate years, Zhu, finding the computational theory of vision by the late MIT neuroscientist David Marr deeply influential, aspired to pursue a general unified theory of vision and AI. In 1992, Zhu continued his study of computer vision at the

Harvard Graduate School of Arts and Sciences The Graduate School of Arts and Sciences (GSAS) is the largest of the twelve graduate schools of Harvard University. Formed in 1872, GSAS is responsible for most of Harvard's graduate degree programs in the humanities, social sciences, and natura ...

. At Harvard, Zhu studied under the supervision of American mathematician David Mumford and gained an introduction to "probably approximately correct" (PAC) learning under the instruction of

Leslie Valiant Leslie Gabriel Valiant (born 28 March 1949) is a British American computer scientist and computational theorist. He was born to a chemical engineer father and a translator mother. He is currently the T. Jefferson Coolidge Professor of Comput ...

. Zhu concluded his studies at Harvard in 1996 with a Ph.D. in Computer Science and followed Mumford to the Division of Applied Mathematics at

Brown University Brown University is a private research university in Providence, Rhode Island. Brown is the seventh-oldest institution of higher education in the United States, founded in 1764 as the College in the English Colony of Rhode Island and Providenc ...

as a postdoctoral fellow.

Career

Following his postdoctoral fellowship, Zhu lectured briefly in

Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...

's Computer Science Department. In 1998, he joined

Ohio State University The Ohio State University, commonly called Ohio State or OSU, is a public land-grant research university in Columbus, Ohio. A member of the University System of Ohio, it has been ranked by major institutional rankings among the best publ ...

as an assistant professor in the Departments of Computer Science and Cognitive Science. In 2002, Zhu joined the University of California, Los Angeles in the Departments of Computer Science and Statistics as associate professor, rising to the rank of full professor in 2006. At UCLA, Zhu established the Center for Vision, Cognition, Learning and Autonomy. His chief research interest has resided in pursuing a unified statistical and computational framework for vision and intelligence, which includes the Spatial, Temporal, and Causal And-Or graph (STC-AOG) as a unified representation and numerous

Monte Carlo methods Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determini ...

for inference and learning. In 2005, Zhu established an independent non-profit organization in his hometown of Ezhou, the Lotus Hill Institute (LHI). LHI has been involved with collecting large-scale dataset of images and annotating the objects, scenes, and activities, having received contributions from many renowned scholars, including Harry Shum. The institute also features a full-time annotation team for parsing image structures, having amassed over 500,000 images to date. Since establishing LHI, Zhu has organized numerous workshops and conferences, along with serving as the general chair for both the 2012 Conference on Computer Vision and Pattern Recognition (CVPR) in

Providence, Rhode Island Providence is the capital and most populous city of the U.S. state of Rhode Island. One of the oldest cities in New England, it was founded in 1636 by Roger Williams, a Reformed Baptist theologian and religious exile from the Massachusetts Bay ...

, where he presented Ulf Grenander with a Pioneer Medal, and the 2019 CVPR held in

Long Beach, California Long Beach is a city in Los Angeles County, California. It is the 42nd-most populous city in the United States, with a population of 466,742 as of 2020. A charter city, Long Beach is the seventh-most populous city in California. Incorporate ...

. In July 2017, Zhu founded DMAI in

Los Angeles Los Angeles ( ; es, Los Ángeles, link=no , ), often referred to by its initials L.A., is the largest city in the state of California and the second most populous city in the United States after New York City, as well as one of the world' ...

as an AI startup engaged in developing a unified cognitive AI platform. In September 2020, Zhu returned to China to join Peking University to lead its Institute for Artificial Intelligence, thus joining another Chinese AI expert in the US and a long-time acquaintance of Zhu, Microsoft's former head of artificial intelligence and research, Harry Shum. Shum was also appointed by Peking University in August to chair the academic committee of the Institute of Artificial Intelligence. Zhu is working on setting up a new and separate AI research institute - Beijing Institute for General Artificial Intelligence (BIGAI). According to the introduction, based on "small data for big task" paradigm, BIGAI focuses on advanced AI technology, multi-disciplinary integration, international academic exchange, to nurture the new generation of young AI talents. The institute is expected to gather professional researchers, scholars and experts, to put Zhu's theoretical framework of artificial intelligence into practice, and jointly promoting Chinese original AI technologies and building a new generation of general AI platforms.

Research and work

Zhu has published over three hundred articles in peer-reviewed journals and proceedings in the following four phases:

Pioneering statistical models to formulate concepts in Marr’s framework

In the early 1990s, Zhu, with collaborators in the pattern theory group, developed advanced statistical models for computer vision. Focusing upon developing a unifying statistical framework for the early vision representations presented in David Marr's posthumously published work titled ''Vision'', they first formulated textures in a new Markov random field model, called FRAME, using a minimax entropy principle to introduce discoveries in neuroscience and psychophysics to Gibbs distributions in statistical physics. Then they proved the equivalence between the FRAME model and the micro-canonical ensemble, which they named the Julesz ensemble. This work received the Marr Prize honorary nomination during the International Conference on Computer Vision (ICCV) in 1999. During the 1990s, Zhu developed two new classes of nonlinear

partial differential equations In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a multivariable function. The function is often thought of as an "unknown" to be solved for, similarly to ...

(PDEs). One class for image segmentation is called region competition. This work connecting PDEs to statistical image models received the Helmholtz Test of Time Award in ICCV 2013. The other class, called GRADE (Gibbs Reaction and Diffusion Equations) was published in 1997 and, employs a

Langevin dynamics In physics, Langevin dynamics is an approach to the mathematical modeling of the dynamics of molecular systems. It was originally developed by French physicist Paul Langevin. The approach is characterized by the use of simplified models while acco ...

approach for inference and learning

Stochastic gradient descent Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of ...

(SGD). In the early 2000s, Zhu formulated textons using generative models with sparse coding theory and integrated both the texture and texton models to represent primal sketch. With Ying Nian Wu, Zhu advanced the study of perceptual transitions between regimes of models in information scaling and proposed a perceptual scale space theory to extend the image scale space.

Expanding Fu's grammar paradigm by stochastic and-or graph

From 1999 until 2002, with his Ph.D. student Zhuowen Tu, Zhu developed a data-driven

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...

(DDMCMC) paradigm to traverse the entire state-space by extending the jump-diffusion work of Grenander-Miller. With another Ph.D. student, Adrian Barbu, he generalized the cluster sampling algorithm ( Swendsen-Wang) in physics from Ising/Potts models to arbitrary probabilities. This advancement in the field made the split-merge operators reversible for the first time in the literature and achieved 100-fold speedups over Gibbs sampler and jump-diffusion. This accomplishment led to the work on image parsing that won the Marr Prize in ICCV 2003. In 2004, Zhu moved to high level vision by studying

stochastic grammar Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselv ...

. The grammar method dated back to the syntactic pattern recognition approach advocated by

King-Sun Fu King-Sun Fu (; October 2, 1930 – April 29, 1985) was a Chinese-born American computer scientist. He was a Goss Distinguished Professor at Purdue University School of Electrical and Computer Engineering in West Lafayette, Indiana. He was instru ...

in the 1970s. Zhu developed grammatical models for a few key vision problems, such as face modeling, face aging, clothes, object detection, rectangular structure parsing, and the sort. He wrote a monograph with Mumford in 2006 titled ''A Stochastic Grammar of Images''. In 2007, Zhu and co-authors received a Marr Prize nomination. The following year, Zhu received the J.K. Aggarwal Prize from the International Association of Pattern Recognition for "contributions to a unified foundation for visual pattern conceptualization, modeling, learning, and inference." Zhu has extended the and-or graph models to the spatial, temporal, and causal and-or graph (STC-AOG) to express the compositional structures as a unified representation for objects, scenes, actions, events, and causal effects in physical and social scene understanding problems.

Exploring the "dark matter of AI" cognition and visual commonsense

Since 2010, Zhu has collaborated with scholars from cognitive science, AI, robotics, and language to explore what he calls the "Dark Matter of AI"—the 95% of the intelligent processing not directly detectable in sensory input. Together they have augmented the image parsing and scene understanding problem by cognitive modeling and reasoning about the following aspects: functionality (functions of objects and scenes, the use of tools), intuitive physics (supporting relations, materials, stability, and risk), intention and attention (what people know, think, and intend to do in social scene), causality (the causal effects of actions to change object fluents), and utility (the common values driving human activities in video). The results are disseminated through a series of workshops. There are numerous other topics Zhu has explored during this period, including the following: formulating AI concepts such as tools, container, liquids; integrating three-dimensional scene parsing and reconstruction from single images by reasoning functionality, physical stability, situated dialogues by joint video and text parsing; developing communicative learning; and mapping the energy landscape of non-convex learning problems.

Pursuing a "small-data for big task" paradigm for general AI

In a widely circulated public article written in Chinese in 2017, Zhu referred to popular data-driven deep learning research as a "big data for small task" paradigm that trains a neural network for each specific task with massive annotated data, resulting in uninterpretable models and narrow AI. Zhu, instead, advocated for a "small data for big task" paradigm to achieve general AI. At the 2023 meeting of the

Chinese People's Political Consultative Conference The Chinese People's Political Consultative Conference (CPPCC, zh, 中国人民政治协商会议), also known as the People's PCC (, ) or simply the PCC (), is a political advisory body in the People's Republic of China and a central part of ...

's National Committee, Zhu said that, in the wake of

ChatGPT ChatGPT (Generative Pre-trained Transformer) is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI's GPT-3 family of large language models, and is fine-tuned (an approach to transfer learning) with both supervised and ...

's release, China should make artificial general intelligence a strategic goal, analogous to the pursuit of nuclear, missile, and satellite technology by the

Two Bombs, One Satellite Two Bombs, One Satellite () was an early nuclear and space project of the People's Republic of China. ''Two Bombs'' refers to the atomic bomb (and later the hydrogen bomb) and the intercontinental ballistic missile (ICBM), while ''One Satellite' ...

project of the 1960s.

Awards and honors

*1999 – Marr Prize honorary nomination, Seventh Int’l Conference on Computer Vision, Corfu, Greece *2001 – Sloan Research Fellow in Computer Science, Alfred Sloan Foundation *2001 – Career Award, National Science Foundation *2001 – Young Investigator Award, Office of Naval Research *2003 – Marr Prize, Ninth Int’l Conf. on Computer Vision, Nice, France *2007 – Marr Prize honorary nomination at the 11th ICCV at Rio, Brazil2008 *2008 – J.K. Aggarwal Prize, Int’l Association of Pattern Recognition. *2011 – Fellow, IEEE Computer Society. *2013 – Helmholtz Test-of-Time Award at the 14th Int’l Conf. on Computer Vision at Sydney, Australia *2017 – Computational Modeling Prize, Cognitive Science Society *2019 – Best Paper Award, ACM TURC Conference

Publications

Books

*S.C. Zhu and D.B. Mumford, ''A Stochastic Grammar of Images'', monograph, now Publishers Inc. 2007. *A.Barbu and S.C. Zhu, ''Monte Carlo Methods'', Springer, Published in 2019. *S.C. Zhu, ''AI: The Era of Big Integration – Unifying Disciplines within Artificial Intelligence'', DMAI, Inc., Published in 2019. *S.C. Zhu and Y.N. Wu, ''Concepts and Representations in Vision and Cognition'', Draft taught for 10+ years, Springer, Preparing for 2020.

Papers

*Zhu, S. C., Wu, Y., & Mumford, D. (1998). FRAME: filters, random fields, and minimax entropy towards a unified theory for texture modeling. International Journal of Computer Vision, 27(2) pp. 1–20. *Y. N. Wu, S. C. Zhu and X. W. Liu, (2000). Equivalence of Julesz Ensemble and FRAME models International Journal of Computer Vision, 38(3), 247–265. *Tu, Z. and Zhu, S.-C. Image Segmentation by Data Driven Markov Chain Monte Carlo, IEEE Trans. on PAMI, 24(5), 657–673, 2002. *Barbu, A. and Zhu, S.-C., Generalizing Swendsen-Wang to Sampling Arbitrary Posterior Probabilities, IEEE Trans. on PAMI, 27(8), 1239–1253, 2005. *Tu, Z., Chen, X.,Yuille, & Zhu, S.-C. (2003). Image parsing: unifying segmentation, detection, and recognition. Proceedings Ninth IEEE International Conference on Computer Vision. *Zhu, S. C., & Yuille, A. (1996). Region competition: unifying snakes, region growing, and Bayes/MDL for multiband image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(9), 884–900. *Zhu, S. C., & Mumford, D. (1997). Prior learning and Gibbs reaction-diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(11), 1236–1250. *Zhu, S.-C., Guo, C., Wang, Y., & Xu, Z. (2005). What are Textons? International Journal of Computer Vision, 62(1/2), 121–143. *Zhu, S.-C., & Mumford, D. (2006). A Stochastic Grammar of Images. Foundations and Trends in Computer Graphics and Vision, 2(4), 259–362. *Guo, C. Zhu, S.-C. and Wu, Y.(2007), Primal sketch: Integrating Texture and Structure. Computer Vision and Image Understanding, vol. 106, issue 1, 5–19. *Y.N. Wu, C.E. Guo, and S.C. Zhu (2008), From Information Scaling of Natural Images to Regimes of Statistical Models, Quarterly of Applied Mathematics, vol. 66, no. 1, 81–122. *B. Zheng, Y. Zhao, J. Yu, K. Ikeuchi, and S.C. Zhu (2015), Scene Understanding by Reasoning Stability and Safety, Int'l Journal of Computer Vision, vol. 112, no. 2, pp221–238, 2015. *Y. Zhu, Y.B. Zhao and S.C. Zhu (2015), Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition, Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). *Fire, A. and S.C. Zhu (2016), Learning Perceptual Causality from Video, ACM Trans. on Intelligent Systems and Technology, 7(2): 23. *Y.X. Zhu, C. Jiang, Y. Zhao, D. Terzopoulos and S.C. Zhu (2016), Inferring Forces and Learning Human Utilities from Video, Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). *D. Xie, T. Shu, S. Todorovic and S.C. Zhu (2018), Learning and Inferring “Dark Matter” and Predicting Human Intents and Trajectories in Videos, IEEE Trans on Pattern Analysis and Machine Intelligence, 40(7): 1639–1652. *Zhu, Y. et al. (2020) Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Human-like Commonsense, Engineering special issue on AI. *S.C. Zhu, (2019) AI: The Era of Big Integration – Unifying Disciplines within Artificial Intelligence, DMAI, Inc..

References

External links

Song-Chun Zhu's page
at

UCLA The University of California, Los Angeles (UCLA) is a public land-grant research university in Los Angeles, California. UCLA's academic roots were established in 1881 as a teachers college then known as the southern branch of the California St ...

{{DEFAULTSORT:Zhu, Song-Chun 1968 births Living people Artificial intelligence researchers Harvard Graduate School of Arts and Sciences alumni University of Science and Technology of China alumni University of California, Los Angeles faculty Academic staff of Peking University Fellow Members of the IEEE Chinese computer scientists