The use of visualization in the scientific literature has a significant effect on impact and communicability, but has been largely ignored in metascience studies. By extracting the figures from the literature and training CNNs to classify them, we can measure how the use of various visualization techniques relate to scientific impact, and how their use varies across fields and over time. We find that the use of diagrams and quantitative plots are associated with higher impact, but that the effect varies across within-discipline citations and across-discipline citations, suggesting opportunities for optimizing scientific communication across discipline boundaries. We also derive heatmaps for large corpuses of figures and use them to describe qualitative differences in how different disciplines present information. These heatmaps also represent a visual signature that can be used to based on jargon and mathematical symbols. We also show how this approach can be used to bootstrap targeted information extraction projects for specific figure types, describing one such project involving phylogenetic trees.


Bill Howe is Associate Professor in the Information School, Adjunct Associate Professor in Computer Science & Engineering and Electrical Engineering. His research interests are in data management, curation, analytics, and visualization in the sciences. As Founding Associate Director of the UW eScience Institute, Howe played a leadership role in the Data Science Environment program at UW through a $32.8 million grant awarded jointly to UW, NYU, and UC Berkeley. With support from the MacArthur Foundation and Microsoft, Howe directs the Urbanalytics group at UW and UW's participation in the Cascadia Urban Analytics Cooperative with the University of British Columbia, where he focuses on responsible data-intensive urban science. He founded the UW Data Science Masters Degree and serves as its inaugural Program Director and Faculty Chair. He has received two Jim Gray Seed Grant awards from Microsoft Research for work on managing environmental data, has had two papers selected for VLDB Journal's "Best of Conference" issues (2004 and 2010), and co-authored what are currently the most-cited papers from both VLDB 2010 and SIGMOD 2012. Howe serves on the program and organizing committees for a number of conferences in the area of databases and scientific data management, developed a first MOOC on data science that attracted over 200,000 students across two offerings, and founded UW's Data Science for Social Good program. He has a Ph.D. in Computer Science from Portland State University and a Bachelor's degree in Industrial & Systems Engineering from Georgia Tech.