Standing on the Shoulders of Mathematical Giants is an interactive exploration of the history of mathematics. The project takes a network theory approach to the study of the relationships between mathematicians through cultural history past. The SSMG dataset was extracted from the Mac Tutor website to produce a catalogue that connects and categorises more than 2000 mathematicians over a three and half millennia timespan. Download Google Earth & the network file and go on a voyage through mathemagical time yourself…
1. Download and install Google Earth.
2. Download the Network. (above)
Community Detection: Choose how the network is subdivided into smaller statistical groupings of mathematicians. (Blondel community detection)
Community Number: If community detection is enabled, choose which community is displayed. Note: Some communities are empty.
Altitude Axis: Define which parameter is represented by the distance from the Earth’s surface. More info: Page Rank, Clustering Coefficient, Hub and Authority Score
3. Open the .kml file with Google Earth.
The Timeline: The timeline should automatically appear in the upper left corner of Google Earth, you can use the icon to pan through historical data or hit the button to play the timeline chronologically. For best playback results, click and set ‘Animation Speed’ to minimum.
Touring: To go on an automated tour of the network, expand the SSMG folder in Google Earth and click on ‘Begin Tour’.
Hints: Cultivating a relationship with the evolution of our collective wisdom empowers us to vision further into the mystery. What types of communities can you find? Ho did mathematics grow? Who are the anomalies? What’s was their story? What’s the meaning of a teapot?
Standing on the Shoulders of Mathematical Giants was my dissertation in 2009. Since its inception I have desired to share this insightful network graph and now, 7 years on, I have finally dusted off the old code. Understandably, the dataset is falling behind the current MacTutor website, which (thanks to the amazing work of John O’Connor’s and Edmund Robertson’s) now hosts over 2800 biographies. If there’s reasonable interest in the project – I’ll put some time into repopulating the dataset. Below is my original thesis, it’s pretty rusty but still others offers a little introduction in network theory and further background to the project.
Jamie Perrelet
Imperial College London
2009
Through the extraction of information from the MacTutor website, an online historical catalogue of mathematicians, it was possible to construct a complex network outlining the history of mathematics, beginning in Egypt over 3500 years ago to the present day. By using Google Earth as a stage a new and innovative approach to the task of clearly visualising a network was achieved. Mathematicians were projected into Google Earth’s three dimensional platform by birth location and birth date, creating an intuitive way of visualising the network and navigating the MacTutor website. A program was built to aid the investigation of the network under various graph theory analysis techniques, such as pagerank and Blondel community detection. Geographical and chronological information were used in the analysis to highlight patterns in the network for regions of the planet, throughout the eras of time.
Basic networks consist of two features; vertices (nodes), constituting a set of items and edges (links), which connect the vertices together [fig 1]. Using the example of facebook, the vertices could represent the members of the website and the edges could signify the friendships between them. The total number of vertices in the network is denoted by N and the number of edges by E. By ensuring that no more than one edge can connect two vertices and that edges do not start and end at the same vertex, the maximum number of edges is clearly defined as (2):
This is effectively N sources connected to N – 1 vertices, divided by to 2 to avoid counting each possible link twice. The direction of the edges may also be considered [fig 1.B], called a ‘directed network’ and the total number of edges is simply N(N – 1) since edges may run in one of two distinct directions. A weighting may also be assigned to the edges to indicate the strength of the connection. Other more composite networks can be constructed that have for example multiple types of vertex or edge. For the purposes of this report we will not discuss the properties of such networks, as they are not studied here. The fraction of existing edges against total possible edges illustrates the widescale density of the network, of which sparse networks have a low value.
The adjacency matrix A, with dimensions N x N, may be used to represent the network, where the i^{th}, j^{th} entry defines the edge between the i^{th} and j^{th} vertices. A nonzero entry indicates the existence of relevant the edge. Note that the diagonal elements of the adjacency matrix are always zero as edges may not connect a vertex to itself in a selfloop. Furthermore, the adjacency matrix of the undirected graph, A_{A}, is symmetric since there is no differentiation between an edge from i → j and an edge from j → i.
By operating on the adjacency matrix it is possible to calculate a range of graph statistics about the network. A simple example would be calculating the number of edges connected to specific vertex, called the degree, k(N). For the undirected graph A_{A}, the number of nonzero elements along any row or column will give the degree, so k(2) = 3. For the directed graph A_{B} the number of nonzero elements along a row gives the number of edges directed away from the vertex, k_{out} and similarly columns define the number of edges coming into the vertex, k_{in}. So k_{out}(2) = 2 and k_{in}(2) = 1. An average degree may be calculated for the entire network, giving an idea of how connected the vertices are in general. Other basic principles include shortest path; the minimum number of edges between two vertices, which can be averaged over the entire network and maximum shortest path for the network, called the diameter. It is important to note that generally in graph theory the position of nodes and length of edges bear no physical meaning and can be selected arbitrarily and hence terms such as diameter are defined by number of edges, rather than physical space.
Clustering coefficient is a useful property in graph theory in recognising localised structure within a network. It is defined as the fraction of edges between a vertexs neighbours against the total number of possibile connections [fig 2].
In the analogy of facebook it expresses the probabilty of a users friends also being connected through a friendship. The clustering coefficient of a vertex i may be calculated as (3). The importance of calculating the clustering coefficient can clearly be seen by comparison of a random graph and a regular lattice [fig 3]. In the case of the lattice each vertex has neighbours out of possible neighbours, so . A random graph can be constructed with N nodes and E edges placed indiscriminately between the vertices; giving a linking probability . If it is chosen that N = 2000 and E = 6000, it gives an average degree in line with the lattice example . The statistical limit of the linking probability will equal the clustering coefficient and therefore .
The clustering coefficient for the random network is far lower than that given for the regular lattice. A low clustering coefficient is expected for the random example as each vertex has the same probability of being connected to any other vertex in the entire network; hence there is a low probability that neighbouring nodes will have an edge between them. This is intuitive as local organisation wouldn’t be expected for a network generated at random. The two graphs effectively offer descriptions of two structural extremes. Whilst the regular lattice has a high clustering coefficient, it also features a high average shortest path and diameter as paths between vertices pass through nodes systematically with the structure of the lattice. The opposite is seen for the random network, as the chaotic distribution of edges makes finding a path across the network easier, giving a low average shortest path and diameter.
A small world network is classified as a combination of these limiting properties, exhibiting a high clustering coefficient and low diameter. This can be achieved by rewiring a portion of edges in a regular lattice; localised structure remains in the unchanged vertices however shortcuts are made across the network through the rearranged edges, lowering the shortest path between vertices. Many real world networks fall under this category, most notably social networks have been publicised for this fact, such as the supposed 6 degrees of separation (4).
Pagerank is a valuable network statistic as it is a measure for calculating the relative importance of a node within a network. Pagerank was developed by Larry Page and has been adopted by the Google search engine (5) by assigning web pages a numerical weighting based on ‘votes’ from the pages that link to it. For this reason there is a natural need to define the direction of edges; pagerank is incalculable for an undirected network.
Initially an equal ranking of is given to each node making the total sum of all ranks unity. The pagerank of a vertex is defined as (6), where is the set of incoming nodes to and is the number of outgoing links from a given node . Using the example of figure 4, each vertex is given an initial rank and hence . From here the algorithm picks a random outgoing edge to follow and calculates the rank at the destination node, the iteration continues to ‘walk’ around the network until the pagerank values converge. A vertex with no outgoing edges is called a sink, if the algorithm finds itself stuck at a sink it picks a node at random to jump to. A damping factor is also included, which defines the probability that the random walker will keep following a path; pagerank jumps to a random node according to this probability.
The bracketed term contains the elements of the adjacency matrix subtracted by a null model. represents the most common example of such a null model, where the randomisation of edges is done so to preserve the degree of each node. By multiplying this by the Kronecker delta function of the communities of and , it ensures that contributions to the modularity are only made if and are within the same community.
Figure 5 shows the method adopted by the Blondel community detection algorithm. First, the algorithm takes each node and considers it as part of the community with each of its neighbours, choosing the community that gives the network the maximum modularity (this may be its own community), this process is known as modularity optimisation. This is a greedy algorithm and may leave Q stuck on a local maximum rather than the true maximum value of Q, hence the importance of the next step. Here, each community is considered as its own vertex and is pushed into a new graph, in a process called community aggregation. The procedure of modularity optimisation and community aggregation may now be repeated, called a pass, to reach a higher value for Q. The process recurs until further iterations do not affect the assigned communities. As a result there may be several levels of community assignment before the algorithm terminates. Initial passes lead to communities with small populations spread between a large number of groups and later passes see more collective communities.
The initial hurdle in achieving the aims discussed was to find a suitable source of information, from which a network could be constructed. The MacTutor (14) website is an online catalogue of mathematicians containing over two thousand biographies of mathematicians through the ages. Upon request the MacTutor owners supplied a spreadsheet, listing the names of all the biographies on their website and the names of mathematicians that each biography hyperlinked too. By writing c++ scripts it was possible to convert this spreadsheet into the standard Pajek network format so that it could read by network workbench. This gave access to various properties of the network; however these values would remain relatively meaningless unless they could be compared against a null model. Further functionality was added to the c++ code to allow it to create such a null model by outputting a randomised version of the network, achieved by rewiring a portion of the edges in the network. The edge swapping relies on a probability coefficient, switch probability, to decide which edges will be rewired and the destination of a rewired edge is selected at random. Care is taken to ensure that the edge does not start and end from the same node and that it doesn’t bridge two mathematicians that are already connected.
At this point it became apparent that the supplied spreadsheet only gave access to a small portion of the information available on the MacTutor website. To broaden the scope of the network a trawler program was built using PHP. The program was constructed to sift through biographies on the MacTutor website and extract information such as full name, birth place and birth date for each mathematician. From here the birth place of each mathematician was converted into a longitude and latitude using a Yahoo API (15). The MacTutor website also features a glossary page listing commonly used mathematical terms; the appearance of these terms in the biographies of mathematicians was also recorded. Finally the data was pushed into a MySQL database for easy extraction.
Later, further information was added to the database in the form of network properties such as pagerank, clustering coefficient, Blondel community results etc. These values were calculated in network workbench, using the initial data discussed above. Next a PHP database extraction tool was built in the form of a graphical user interface [fig 6]. The tool utilised the full flexibility of the database by giving the user access to a wide range of functionality.
The user specifies the following:
Once these details are submitted the program outputs files as comma separated variables, containing the computed values. These files are easily read by Microsoft Excel, so that they can be expressed graphically.
The second half of the program outputs network formats. The user specifies the following:
There is a variety of software that can elegantly project network formats into different visual representations, however the figure quickly becomes difficult to interpret for systems with a couple of hundred vertices and edges or more. The human eye simple can extrapolate information from projections of this magnitude [fig 7].
The last decade saw a revolution in data storage, most notably on the internet, but how can such large volumes be sensibly converted into a manageable graphical depiction? The issue has become a popular topic of enquiry in recent years and was one that faced this investigation squarely. The information gathered to construct the network was collected, after all, from a website and so it appeared vital that an interpretation of it be available to the same audience as MacTutor.
Whilst the physical positioning of vertices is normally completely arbitrary within graph theory, it came to attention that the database contained geographical information for each mathematician; namely their birthplace. It was realised that the translation of this information onto a world map would be interesting. Initially Google Maps, was fed the dataset, however it struggled to process such a large number entities, instead Google Earth was the chosen platform. A placemark was positioned for each mathematician, with lines located to represent the hyperlinks between them. A third dimension was also added to the projection by placing mathematicians radically based on the chronology of their birth year. Mathematicians born further in the past are seen further from the surface of the Earth and more recent ones closer [fig 8]. Software already exists that can project a network in 3 dimensions, however this seldom helps to resolve the task of visualisation, instead it often adds another layer of complexity to navigation. Despite this, in the projection described here the Earth acts as a recognisable reference and gives the user a perspective on the network.
Google Earth also features the ability to datestamp objects, making them visible if the user selects a date to view the Earth later than that of the datestamp. By using the birth year of each mathematician as a datestamp it was possible to construct a timeline of mathematicians. By clicking play on the timeline an evolving model of mathematics through the ages is generated, which is particularly attractive due to its ease of use. In this evolution clusters of mathematicians can be clearly seen sprouting from the network, localised both in space and time.
The next step was to consider the Google Earth projection as method for browsing the MacTutor website. This was easily achieved by adding information to the nodes that appears when they are clicked on. This information includes a hyperlink to their MacTutor biography, full name, biographical picture, birth place, birth year and a list of hyperlinks to mathematicians associated to the selected mathematician [fig 9]. This allows the user to navigate their way through the world of mathematics, find a mathematician and read the biography of that person or associated persons in a single click. Internet navigation of this sort is highly intuitive, due to its graphical nature.
The Google Earth projections described here could easily have a whole wealth of development added to them. For example the radial dimension currently expressed as time is completely arbitrary and could instead indicate pagerank, clustering coefficient or any other network property. This would effectively produce a 4 dimensional projection with 2 spatial dimensions, 1 time dimension and 1 other characteristic of choice. Furthermore extra information may be added to the projection by scaling or colouring nodes, an instance of this was achieved by colouring nodes based on the Blondel community that they belong to [fig 10].
To identify the overall level of structure of a network, it is common practice in graph theory to compare a network to a null or randomised model. Two different types of null model were used for this purpose [fig 11]. The first is described above and was written into the code of initial output to .net format; a random switch probability, which defines the probability that the destination of an edge is swapped to a random node. The second null model is built into NWB and is based on a random linking probability, which defines the probability that any two vertices will be connected (16). The random linking probability is defined as , which is the ratio of edges to possible connections in the network and is selected so that the number of edges is unchanged, giving a similar average degree.
Network:  
normal  0.4*  1.0*  0.0065†  
Degree (Av)  14.157  14.157  14.154  14.161 
Clustering Coeff (Av)  0.271  0.053  0.008  0.007 
Diameter

8  6  5  5 
Shortest Path (Av)  3.267  3.152  3.145  3.175 
# of Isolates  80  5  0  0 
Notice that the clustering coefficient is reasonably high for the normal network but rapidly decreases for the null models. This suggests that localised structure is being lost and that there is information within the network that cannot be accounted for by chance. Also note that both the diameter and shortest path fall for the randomised models. This is of interest as it suggests that shortcuts are being made across the network that would not normally exist. The nature of the network explains this fact; vertices are compiled across a range of three and a half millennia and it is therefore unlikely that a mathematician today would work on ideas from Aristotle’s time.
A clear distinction is visible in figure 12 between the network and a null model. The unaltered network displays a much larger range in pagerank values, highlighting highly influential mathematicians such as Newton and Euclid, with a long tail of mathematicians with low pagerank. The random switch model shows a far steadier distribution, where the pagerank distribution is fairly stable. The long tapering off tail is a classic characteristic in graph theory, outlining how the network contains central figures with high degrees, connected to a large number of nodes.
In this section a variety of network features associated to the geographical position of nodes and their appearance in time are discussed. Figure 13 demonstrates how the population of the network is closely related to the world population; it is interesting to note that the influx of mathematicians from the renaissance begins before the explosive growth of global population. This may be attributed to the expansion of knowledge, which must predate the technology that gave way to the development of more widespread civilisation and hence an increasing population. At any time there is roughly the same number of significant mathematicians globally (approximated 1 significant mathematician to every 2 million people).
The way in which information and knowledge has been shared around the planet has changed dramatically over the timescale associated to the network. Figure 14 shows an increasing trend in the average distance of edges up to the modern era, whilst the average time span of edges has been low since the 17^{th} century. It is straightforward to account for this pattern as a result of improvements in technology and communication. A mathematician born 2000 years ago would have had little opportunity to travel and hence would have been confined to the mathematicians and the understanding of mathematics within the local geographical region. A limited number of accessible mathematicians results in short distance edges covering large time spans, due to the large time gaps between successive mathematicians.
The average time span plot features some strong outliers to the trend. By reading the MacTutor biographies of these mathematicians it quickly becomes apparent why they would be associated to mathematicians from such different eras. Computer scientist John Backus remarkably rediscovered the work of Panini of Shalatula 2500 years after its conception. Thomas Heath and Paul Tannery were both historians of mathematics, Thomas specialised in the Greeks and Paul “spent all his free time working on the history of mathematics in ancient cultures”. These examples demonstrate how the network analysis can reveal information that isn’t explicitly stated.
In graph theory the physical position or length of edges are normally regarded as intangible as the only information associated to an edge are the two nodes that it connects. Despite this, because the mathematicians have a defined position in space and time, the distance and time span of edges as separate entities may be calculated [fig 15].
The majority of edges span over comparatively small distances. Whilst this is expected for mathematicians from times when communication was limited, the majority of mathematicians within the network were born within the last few centuries. This suggests that while the world has become smaller as communication technologies and travel capabilities have improved, the bulk of work is shared over reasonably short distances. A second smaller peak is also apparent, corresponding to many of the large distance connections across the globe.
A similar plot may also be created based on edge time span; however because of the population weighting towards the modern era, the vast majority of connections are made over short time scales. This overloads the data and makes analysis of the smaller time spans out of range.
The Blondel community detection algorithm was adopted in splitting the network up into 15 terminal subgraphs [fig 16]. The communities produced feature a variety of characteristics; some communities such as community 1 are contained within a small geographical region over a long period of history. Other communities such communities 7 and 11 are concentrated within the modern era, with members from all areas of the globe. Communities such as 6 and 9 contain only contain a few members and are of little interest as they contain limited information.
Figure 16 makes it possible to pinpoint potentially interesting communities for analysis. As discussed above community 1 spans 14 centuries but only 13° of longitude, with the exception of William Spottiswoode [fig 17]. This era is called the classical period of India mathematics, where unlike Vedic mathematics astronomical information was contributed to the field, classifying mathematics as an ‘astral science’ (18). The notable outlier of this history is William Spottiswoode, born 1825; it is intriguing to ask why he has been placed in a community of India mathematicians and astronomers. William’s MacTutor biography answers this question:
“He combined these different skills by undertaking research on the history of mathematics and astronomy in India”
Spottiswoode also married the daughter of William Urquhart Arbuthnot, a member of the Council for India, tying him closely to the history of Indian mathematics and culture.
A similar pattern is visible in community 13 [fig 18]; spanning 18 centuries and only 17° of longitude, with two exceptions. Chinese mathematics developed independently from western ideas up until the early 17^{th} century (19), many concepts such as Pascal’s triangle and Pythagoras’ theorem were conceived in China centuries before there rediscovery elsewhere (20). Community 13 contains a total of 22 of the 31 Chinese mathematicians in the database, the other 9 Chinese mathematicians were either born pre Han mathematics or in the modern global era. Again, it is important to question why two European mathematicians, Matteo Ricci and William Horner, should be included in a community of Chinese mathematicians. William Horner is famed for the Horner method of solving algebraic equations however the process had been anticipated by Zhu Shijie of China in the 13^{th} century. Ricci on the other hand had a compassionate connection to China as he spent the majority of his life there:
“Ricci arrived at Macau on the east coast of China in 1582” “Died: 11 May 1610 in Peking, China”
Community 11 is a large group with 257 members. The characteristics of community 11 are very different to those described in communities 1 and 13 as it features members from all over the globe contained within the last two centuries [fig 19]. Since there is no geographical clustering it is more difficult to extrapolate the underlying relationship between the mathematicians within the community. By scaling the glossary terms connected to members of the community by the total number of times the glossary term appears over the entire network, it is possible to gain insight into the fields of study associated to the community. The highest ranking glossary terms under this regime are ‘quantum mechanics’ (29/62), ‘cosmology’ (12/34), ‘statistical mechanics’ (11/22) and ‘atomic theory’ (3/9). There is a clear connection between all of these terms and the study of Physics. The simple observation of the members of the community reveals this fact as it contains many of the modern pillars in physics; Albert Einstein, James Maxwell, Max Plank, Paul Dirac, Paul Ehrenfest, Erwin Schrödinger to name just a few.
The remaining communities not discussed here may also be analysed along the same lines. The examples given are a clear demonstration that the Blondel community detection algorithm produces significant results. The implications of such community detection are farreaching, as it allows for the extraction of information concerning nodes that is not apparent on the surface.
Capturing Data:
Whilst the MacTutor website follows a regular formatting structure, it appears to have been constructed by hand. The trawler program which was built to extract the data from biographies is an automated script and therefore naturally fell into difficulties when coming across inconsistencies in the layout of the MacTutor website. Many of these issues were noticed and were programmed into the workings of the trawler; however information for a portion of mathematicians was uncollectable for one of many reasons. As a result a total of 2163 out of a possible 2222 mathematicians were successfully catalogued in the database.
Positioning by Birth Place:
The birth place of a person is an important start in building up a picture of them; however it suggests nothing about their movements later in life. Many of the mathematicians in the database do not move permanently away from the region of their birth place, but this is not true for all.
Limits to MacTutor:
The edges within the constructed network are effectively defined by hyperlinks on the pages of the MacTutor website. As a result the network can only be as comprehensive as the MacTutor biographies. As with any historical knowledge there may be missing or misleading information. Furthermore, the task of building a complete history of mathematics with only two authors is a tall order, limiting the attention that can be given to each biography. This may result in biographies that do not contain the more subtle attributes of a mathematician’s life.
After a network of the history of mathematicians had been successfully compiled into a database, the aims of this investigation were twofold. First, a clear visualisation of the network was required, second a set of tools were needed that would allow for quick and detailed analysis of the network. As discussed, the implications of the Google Earth projection are far reaching and could be applied to a variety of networks, particularly those constructed from a set of web pages. Now that it has been demonstrated that Google Earth can be used for such purposes, it wouldn’t take much further development to build a program that could translate a site map or network into a projection. Graph theory analysis techniques could easily be used to highlight important nodes based on one criteria or another.
The software developed for investigating the network can produce a huge variety of different analytical plots and further filters could also be added to these tools to gain deeper and more precise perspectives on the network. The glossary terms recorded for mathematicians effectively constitute an entirely separate set of edges for analysis, or could even be used as a second category of node in a bipartite network. Due to time restraints it was unfortunately not feasible to explore all of these possible avenues, but now that the cat is out of the bag they are well and truly open for further study.
1. Immunization of complex networks. PastorSatorras, Romualdo and Vespignani, Alessandro. 036104, Barcelona : PHYSICAL REVIEW E,, 2002, Vol. 65.
2. arXiv:condmat/0405123v1 [condmat.statmech] Complex Networks. Evans, T.S. 2004.
3. Caldarelli, Guido. ScaleFreeNetworks. Oxford : Oxford University Press, 2007.
4. The Small World Problem. Stanley, Milgram. s.l. : Psychology Today, 1967, Vol. 2.
5. Google Inc. Google Press Center: Fun Facts. [Online] http://www.google.com/press/funfacts.html.
6. The PageRank Citation Ranking: Bringing Order to the Web. Page, Larry and al, et. s.l. : Standford Digital Library Technologies Project, 1999.
7. arXiv:0803.0476v2 [physics.socph] Fast unfolding of communities in large networks. Blondel, Vincent. D and al, et. 2008.
8. Team, NWB. Network Workbench. [Online] Indiana University, Northeastern University and University of Michigan, 2006. [Cited: January 14, 2009.] http://nwb.slis.indiana.edu/.
9. The PHP Group. PHP: Hypertext Preprocessor. [Online] http://php.net/index.php.
10. Sun Microsystems Inc. MySQL. [Online] 2009. www.mysql.com.
11. Google Inc. Google Earth. [Online] http://earth.google.com/.
12. K. Borner, et al. Pajek Documentation. [Online] [Cited: January 14, 2009.] http://vw.indiana.edu/tutorials/pajek/.
13. Google Inc. KML Reference – KML – Google Code. [Online] http://code.google.com/apis/kml/documentation/kmlreference.html.
14. J. J. O’Connor, E. F. Robertson. MacTutor History of Mathematics. [Online] http://wwwhistory.mcs.stand.ac.uk/.
15. Yahoo! SARL. [Online] http://api.maps.yahoo.com/ajax/geocode?appid=onestep&qt=1&id=m&qs=.
16. Efficient generation of large random networks. Batagelj, Vladimir and Brandes, Ulrik. 036113, s.l. : PHYSICAL REVIEW E, 2005, Vol. 71.
17. Netherland Environmental Assessment Agency. Historical Population Data. [Online] HYDE. http://www.pbl.nl/en/themasites/hyde/basicdrivingfactors/population/index.html.
18. O’Connor, J. J. and Robertson, E. F. Indian mathematics. [Online] http://wwwhistory.mcs.stand.ac.uk/HistTopics/Indian_mathematics.html.
19. Joyce, David E. History of Mathematics: China. [Online] Clark University. http://aleph0.clarku.edu/~djoyce/mathhist/china.html.
20. 221–222, Dauben. 2007.
Leave a Reply