Thurston, Selberg, and Random Polynomials, Part I.

Apart from everything else, you could always count on Bill Thurston to ask interesting questions. This is the first of a small number of posts which were motivated in part by figure two from this paper, and this accompanying MO question. I liked this problem enough to give it as a thesis problem to my student Zili Huang, and much of what I discuss below arose from this project.

Say that an algebraic integer \(\alpha\) is Perron if \(|\alpha| > |\sigma \alpha|\) for every conjugate \(\sigma \alpha\) of \(\alpha\). One immediately observes that \(\alpha\) must be real. Say that a monic polynomial is Perron if it is irreducible and has a Perron integer as a root. Thurston’s question is (roughly) to describe the distribution of Perron algebraic integers, especially those chosen in some (small) fixed interval in \(\mathbf{R}\). This question has several interpretations, but one experiment Thurston does is to take 20,000 monic polynomials of degree 21 with integer coefficients in \([-5,5]\), and plots the quantities \(\sigma \alpha /\alpha \in B(1)\) for all the conjugates of the 5,932 resulting Perron polynomials such that the corresponding Perron integer was in the interval \([1,2]\). The result is this:

Perron

The first observation is that this graph has (apart from some noise coming from real roots) rotational symmetry. The next observation is that the roots tend to be concentrated in a ring of some radius, which (from experiment) becomes more concentrated the more one restricts the range in \(\mathbf{R}\) of the Perron integers one is considering. The first question is: can one explain this graph, and does it reflect reality (that is, the actual distribution of Perron integers)?

The answers to these questions turn out to be: yes, and no. The first problem is that it is hard (a priori) to “randomly” generate Perron algebraic integers of large degree in \([1,2]\). Knowing a bound on the roots places a bound on the coefficients, but a randomly chosen polynomial with coefficients satisfying the required bounds will almost always have a root larger than 2. Thus Thurston “cheats” with his algorithm, making the coefficients of his polynomials very small in order to increase the probability that the largest root will also be small. (Full disclosure, Thurston makes no claims that his algorithm reflects reality, and explicitly asks whether it does so or not.) The issue is then whether this will skew the distribution of the roots. It turns out that it does! To explain why this might not be surprising, let’s talk about the size of the spaces over which Thurston is sampling. Let \(\Omega^P_{21}\) be the set of monic polynomials of degree 21 with real coefficients and with a unique largest real root \(\lambda \le 2\). Thurston is sampling over a space with \(11^{20}\) lattice points and volume \(10^{20}\). On the other hand, it turns out that the volume of \(\Omega^{P}_{21}\) is equal to

\(\displaystyle{\frac{2^{399}}{3^{24} 5^{12} 7^{10} 11^{11} 13^{9} 17^{5} 19^{3}}} \sim 2.249 \times 10^{60}.\)

So Thurston was only really sampling a \(10^{-40}\)th of the entire space! Thurston’s picture can be explained as follows: polynomials with (suitably) small coefficients (contingent on the initial and final coefficients not being too small) tend to have all their roots clustering uniformly around the disc of radius one. This follows in the radial direction by a famous theorem of Erdös and Turán, and for the absolute values it follows (in a related way using Jensen’s formula) from a paper of Hughes and Nikeghbali here. So the apparent “radius” in Thurston’s picture is just representing \(1/R\), where \(R\) is the approximate size of the Perron integers being considered. It turns out that, in reality, most of the conjugates of Perron integers have size comparible to the Perron integer itself. That is, the correct version of Thurston’s picture should show the roots clustering (roughly) uniformly around the boundary.

OK, now a pause when I look at Thurston’s graph and see that the radius is not something like a half as I claimed above, but something much smaller. So I just repeated Thurston’s experiment, and out of 20,000 monic polynomials with coefficients randomly chosen in [-5,5], only 1011 were Perron polynomials with largest root less than 2, and the resulting picture came out like this:

New Perron

Here one really sees the (misleading) accumulation around the radius \(1/2\). I’m guessing that Thurston actually kept all polynomials whose largest root was in \([1,5]\), which would account for the larger success rate for choosing Perron polynomials as well as the smaller radius. This is also consistent with how Thurston describes the corresponding graphs in the MO question rather than in Figure 2 of his preprint.

So how does one study Perron integers? Let us re-wind slightly and discuss a more elementary problem. How does one count algebraic integers? The most natural way to count algebraic integers is to order them by height. However, Thurston’s problem clearly suggests a different measure, namely, to count by the size of the largest conjugate. This has a profound effect on some of the statistical properties under consideration. Roughly, algebraic integers ordered by height are much more likely to have a small number of “outliers” with large absolute value, whereas when one orders by the size of the largest conjugate, most of the other conjugates accumulate around the circle with radius the size of the largest root as the degree goes to infinity.

The problem of understanding algebraic integers of bounded size (where by bounded we mean a bound on the largest conjugate) amounts to understanding the lattice points in a certain region of \(\mathbf{R}^N\). Now as long as one fixes the degree and increases the bound, such counting problems (including this one) typically reduce to a volume problem. (One also uses the fact that almost all polynomials are irreducible, and that the regions are “nice” in some explicit way, i.e. not Cantor sets.) Moreover, the corresponding regions are essentially (up to a simple stretching) independent of the bound. Hence the key region to understand is the region \(\Omega_{N} \subset \mathbf{R}^N\) of monic degree \(N\) polynomials all of whose roots have absolute value at most one, and the region \(\Omega^{P}_N \subset \Omega_N\) consisting of such polynomials whose largest root is real. Of course, one is not only interested in the volumes of these regions, but also the integrals of various quantities. As an example, one can consider the integral

\(C_N(T,\alpha) = \displaystyle{\int_{\Omega_N} P(T) |a_N|^{\alpha – 1} dV}\)

where \(P \in \Omega_N\) represents the monic polynomial at any point, and \(a_N\) is the constant term. Evaluating this integral at \(\alpha = 1\) and taking the leading term (in \(T\)) recovers the volume. On the other hand, there are some other relations. A fairly simple computation shows that

\(\mathrm{Vol}(\Omega^P_N) = \displaystyle{\frac{4}{N(N+1)} C_{N-1}(1,1)},\)

which is how one can compute the left hand side exactly for any \(N\). In order to evaluate these integrals, it makes more sense to integrate not over the “coefficient space” of polynomials, but rather the “configuration space” of roots. The coefficient space is naturally stratified by the number of real and complex roots. For that reason, it makes sense to decompose \(\Omega_N\) as

\(\coprod_{R + 2 S = N} \Omega_{R,S}\)

where \(\Omega_{R,S}\) corresponds to polynomials whose roots all have absolute value at most one and have signature \((R,S)\) (since we are interested only in integrals, we elide issues concerning whether one wants these spaces to be open or closed or somewhere in between). As a special case, let’s think about the integral \(C_{N,0}(T,\alpha)\) where one restricts the integrand to \(\Omega_{N,0}\). The configuration space is simply \([-1,1]^N\). On the other hand, the map from configuration space to coefficient space is just given in terms of the symmetric polynomials, and the corresponding Jacobian matrix is the Vandermonde determinant. Hence, taking into account the action of \(S_N\) on the fibres, one finds that

\( C_{N,0}(T,\alpha) = \displaystyle{\frac{1}{N!} \int_{[-1,1]^N} \left| \prod x_i \right|^{\alpha – 1}
\prod (T – x_i) \prod |x_i – x_j| dx_1 \ldots dx_N}.\)

This is now very reminiscent of the classical Selberg Integral. There is some beautiful mathematics related to the Selberg integral; let me direct you here for a nice survey. The integrals arising here are, however, not quite Selberg integrals except for some very degenerate cases.

Once you start writing these integrals down, and computing some of them (by hook or crook), there are a number of problems which naturally come to mind. For example, what is the probability that a random polynomial all of whose roots have absolute value at most one is Perron? Well, by explicitly computing the ratio of the volume of \(\Omega^P_N\) to \(\Omega_N\), you find that the answer is \(1/N\) if \(N\) is odd and \(1/(N-1)\) if \(N\) is even (this checks out for \(N = 1,2\)). On the other hand, you might ask, given a polynomial all of whose roots have absolute value at most one, what the expected number of real roots?, or what the probability is (at least in even degree) that the polynomial has no real roots at all? Having asked these questions, it is then sensible to ask the same questions for other ways of choosing random polynomials. The classical way to choose a real random polynomial is to write

\(f(x) = a_N x^N + \ldots + a_0\)

where the \(a_i\) are independent normal variables with mean zero (this is the Kac ensemble). To what extent do the statistics of random polynomials with this measure mirror the constrained problem consisting of polynomials all of whose roots have absolute value at most one? Obviously, it depends on the type of problem one considers. The most classical problem for real polynomials concerns counting the expected number of real roots. A famous theorem of Kac says that, under the ensemble above, the expected number of real roots is approximately \(2/\pi \cdot \log(N)\). I recommend reading this paper for an introduction to the subject; I learnt these things from chatting with Peter Sarnak at the IAS.) The methods of Kac also show that the real roots concentrate for large N around \(– 1\) and \(+ 1\). In fact, the complex roots also concentrate along the unit circle as well. How does this compare to our constrained model? First of all, the real roots in Kac model either lie in \([-1,1]\) or in \([-\infty,1] \cup [1,\infty]\). Certainly our polynomials have no roots in the larger region. If one restricts the Kac polynomials to \([-1,1]\), then the expected number of real roots decreases to \(1/\pi \cdot \log(N)\). This is in some sense easy to see from the previous formula, because the map on coefficients \(a_k \rightarrow a_{N-k}\) is measure preserving and inverts the roots. In fact, a stronger result follows from Kac. If one takes an inteveral \([a,b]\) strictly contained inside \([-1,1]\), then the expected number of real roots in the polynomial for sufficiently large \(N\) converges to

\( \displaystyle{\frac{1}{\pi} \int^{b}_{a} \frac{1}{1 -T^2}}.\)

This gives another strong indication of how the roots are concentrating at the points +1 and -1. OK, so now let us return to our constrained model consisting of monic polynomials all of whose roots have absolute value at most one. How many real roots does one expect such a polynomial to have? There’s a natural map

\(\Omega_{N-1} \times [-1,1] \rightarrow \Omega_{N}\)

which sends \(P(x)\) to \(P(x)(x-T)\). The Jacobian of this matrix turns out to be equal to \(|P(T)|\). On the other hand, the map is not one to one, rather, the image of \(\Omega_{R,S}\) has multiplicity \(R\). Hence, if \(Z(P)\) denotes the number of real roots of the polynomial \(P\), then

\(\displaystyle{\int_{\Omega_N} Z(P)} = \int_{0}^{1} \int_{\Omega_{N-1}} |P(T)| dV\)

The left hand side (after dividing by the volume) gives the expected number of real roots. So one is again reduced to a Selberg type integral. In this case, one apparently has (based on some Zagier-like integral guessing mojo, but unfortunately not yet Zagier-like integral proving mojo) for \(N = 2m\),

\( \displaystyle{\frac{1}{D_N}
\int_{\Omega_N} |P(T)| =
\frac{1}{2^{2m}
{2m \choose m}}
\left( \sum_{k=0}^{m} \frac{2m-2k+1}{2m+1}
{2m-2k \choose m-k}
{2k \choose k}
T^{2k} \right)
\left( \sum_{k=0}^{m}
{2m-2k \choose m-k}
{2k \choose k}
T^{2k} \right)},\)

and there is a similar formula for \(N = 2m+1\). After some analysis to estimate the resulting integral of the RHS from \(T = -1\) to \(1\), it turns out that, for large \(N\), the expected number of real roots is approximately

\( \displaystyle{\frac{1}{\pi} \log N},\)

whic is exactly in accordance with the Kac model! Indeed, if one restricts to real roots in an interval \([a,b]\) strictly in \([-1,1]\), then one also obtains the same integral formula as in the Kac ensemble. So, somewhat surprisingly to me, the number of real roots in \([-1,1]\) behaves in a very similar way whether one considers Kac polynomials or monic polynomials all of whose roots have absolute value at most one.

What then of the other problems? Given a polynomial in the Kac model of even degree \(2N\), what is the probability that is has no roots in the interval \([-1,1]\)? This problem was explicitly addressed by Dembo, Poonen, Shao, and Zeitouni here, where they show (under less restrictive hypotheses) that this occurs with probability \(O(N^{-b/2 + o(1)})\) for some universal constant \(b/2\) which they do not determine, although they estimate based on numerical evidence that \(b/2 = 0.38 \pm 0.015\). What happens in our constrained model? Once more it comes down to a Selberg-like integral, this time computing the ratio of volumes:

\(\displaystyle{\frac{\displaystyle{\int_{\Omega_{0,N}} dV}}{\displaystyle{\int_{\Omega_{2N}} dV}}}\)

It turns out that one can compute this explicitly as a product of factorials. Moreover, one can compute the exact asymptotic in this case as \(N \rightarrow \infty\), and the resulting probability is

\(\displaystyle{ \frac{2C}{\sqrt{2 \pi} (2N)^{3/8}}, \ \text{where} \ C = 2^{-1/24} e^{-3/2 \cdot \zeta'(-1)} = 1.24512 \ldots }\)

(It may be hard to read in the exponent, but that is the derivative of the Riemann zeta function \(\zeta'(-1)\) at \(-1\). That may seem strange, but in fact this is a fairly typical constant that comes up in asymptotics of the Barnes-G function, which is exactly the type of expression (a product of factorials) which turns up in the evaluation of the relevant integrals.) Now the result of DPSZ does not apply in our case (where the coefficients are a long way from being independent), but given the similarity in the distribution of real roots between our polynomials and the Kac model, we naturally make the following conjecture:

Conjecture: The constant \(b/2\) is equal to \(3/8\).

Optimistically, one might even try to prove this conjecture by showing that the statistics of our collection of polynomials mirror those of the Kac polynomials for sufficiently large \(N\).

Next time: we discuss a more concrete relationship between random polynomials and our models in terms of limits of gap probabilities. But let me also leave you with the following teaser question: What is the probability that the largest root of a polynomial of degree \(N\) is real?

Posted in Mathematics, Students | Tagged , , , , , , , , | 6 Comments

A postview of Bellairs/Barbados

I am just recovering from my trip to Barbados for the McGill sponsored conference at the Bellairs institute (which I previously discussed here). I thought it was a wonderfully enjoyable conference, for many reasons. The first is that I got to give 14 hours or so of talks, and I like the sound of my own voice. What was unique, however, was the really high level of the audience, not just in terms of technical strength, but in terms of their knowledge of the particular topics which were being discussed. Usually when you have a chance to talk to a specialized audience, you only have 50 minutes to speak, and for at least for the first 20 minutes or so you should not assume that your audience is au fait with all the latest technical developments in the subject. On the other hand, the contexts in which one has multiple hours to give details (such as a mini-course or graduate class) it’s often the case that the target audience is graduate students first encountering the material. At this conference, practically half the audience had written papers proving modularity lifting theorems! I surveyed some participants beforehand on how long I should spend reviewing the basic theory of Galois deformations, and the answers typically ranged from 1 to 5 minutes. In reality, I gave a 150 minute “background” talk on the first morning, although by background here I really mean Wiles’ proof of minimal modularity lifting for irreducible modular Galois representations of \(G_{\mathbf{Q}}\).

I broke the mold of previous Bellairs conferences by scheduling an additional talk in the afternoon, so typically we had some 6-7 hours of lectures per day. This sounds a lot, but when it is divided up into only three speakers and spread out from early morning to late evening, it didn’t seem so much at all. (We still had plenty of time every day to snorkel at the reef, and even one free afternoon to go on a boat tour and swim with the turtles. Even Sug Woo’s 200+ minute talk just flew by, although it was accompanied by rum drinks.) In addition to the background talks I mentioned previously, there were also research talks by Peter Scholze, Jack Thorne, George Boxer, Ila Varma, and David Geraghty (I may blog about some of these talks later). I think this was the first conference in which I learned something from every single talk. Of course, I did get to suggest many of the participants, so in a way this conference was designed for me.

Speaking of great theses by Richard Taylor students (George and Ila), it’s kind of amazing what is required/expected of a graduate student in number theory nowadays. It certainly makes me feel positive towards the future of our subject. Speaking of Richard, I heard (although have no confirmation) that he thought the conference sounded interesting, and so it is somewhat embarrassing that I didn’t suggest his name as someone to be invited. On the other hand, it would have been even more embarrassing for him to have actually come and then had to share a room with someone (the accommodations were fairly spartan) while I was in a room by myself. Along those same lines, I’m 100% certain that Mark (“I don’t get out of bed for less than $10,000 a day”) Kisin would not have come. (Full disclosure, Mark claims that Toby and I exaggerate spread false rumours concerning his demands for luxury accommodations at conferences.)

One outcome of the conference is that I feel confident that we will have unconditional modularity lifting theorems for \(\mathrm{GL}(n)/\mathbf{Q}\) in the next five years. Of course, it’s always dangerous to make predictions.

Finally, apropos of nothing, I hope to have more posts in the future whose keywords include both “Richard Taylor” and “Turtles.”

Posted in Travel, Waffle | Tagged , , , , , , , , , | Leave a comment

Are business schools intellectually bankrupt?

From the New York Times today, a report from business school professors concerning a study which claims to show that professors are prejudiced, too. I remember reading the original paper on this study, which made it painfully clear that the authors were pursuing an agenda and that they arrived at their conclusions by scouring their data for correlations which supported their case, a classic hallmark of poor science. But perhaps sound methodology is too much to expect from business school professors?

The reason I paid particular attention to this study was that I was one of the participants. Here was the original email I received.

Dear Professor XXX,

I am writing you because I am a prospective doctoral student with considerable interest in your research. My plan is to apply to doctoral programs this coming fall, and I am eager to learn as much as I can about research opportunities in the meantime.

I will be on campus next Monday, and although I know it is short notice, I was wondering if you might have 10 minutes when you would be willing to meet with me to briefly talk about your work and any possible opportunities for me to get involved in your research. Any time that would be convenient for you would be fine with me, as meeting with you is my first priority during this campus visit.

Thank you in advance for your consideration.

I remember receiving this email. What immediately struck me was the repeated vague references to “my research.” Now in order to have any appreciation of my research, you would, at the very least, have to know that that Langlands program exists, or that my research is related to Wiles’ proof of Fermat’s Last Theorem and algebraic number theory. The fact that there is no mention of number theory nor any indication of the background of the student immediately links the email (in my mind) to academic spam. Surely it’s the case that my reaction would be shared by many academics? Who is so desperate for attention that they would imagine this email reflects a genuine personal interest in their work? As you would expect, I completely ignored the email and promptly forgot about it. Then, a week later, I received the following email:

Dear Professor XXX,

Recently, you received an email from a student asking for 10 minutes of your time to discuss your Ph.D. program (the body of the email appears below). We are emailing you today to debrief you on the actual purpose of that email, as it was part of a research study. We sincerely hope our study did not cause you any disruption and we apologize if you were at all inconvenienced. Our hope is that this letter will provide a sufficient explanation of the purpose and design of our study to alleviate any concerns you may have about your involvement. We want to thank you for your time and for reading further if you are interested in understanding why you received this message. We hope you will see the value of the knowledge we anticipate producing with this large academic study.

We are decision-making researchers interested in how choices differ when they are made for “now” versus for “later”. Previous research has shown that people tend to favor doing things they viscerally want to do over what they believe they should do when making decisions for now, while they are more likely to do what they believe they should when making decisions for later (for a review, see Milkman, Rogers and Bazerman, 2008). The email you received from a student asked for a meeting with you either today (if you were randomly assigned to the “now” condition) or in a week (if you were randomly assigned to the “later” condition). This email was actually from a fictional student. It was designed for a study of the responsiveness of University faculty to meeting requests from prospective students of various backgrounds made on short notice versus well in advance. Faculty members at the top 260 U.S. Universities (as ranked by U.S. News and World Report) and affiliated with Ph.D. programs were identified as potential participants in this study, and a random sample (6,300 faculty in total – one per Ph.D. program) were selected to receive emails. In addition to examining the responsiveness of faculty to meeting requests for “now” versus “later”, we are also interested in how the identity of the applicant affects, or does not affect, response rates, and as such, the name of the student sending a meeting request was varied (by race and by gender). We expected that students from underrepresented groups would receive fewer meeting acceptances than other students, though we have competing hypotheses about whether this would effect would be stronger in the “now” or the “later” condition.

I love the line concerning the fact they the have “competing hypotheses about whether this would effect would be stronger in the “now” or the “later” condition” — see, they prove their case whatever the data!

Given the ridiculousness of the initial email, I was appalled that my response might contribute to some published data implying that professors were dismissive of minorities and/or women. Let me be clear at this point that it may well be the case that academics are more dismissive towards women (I fear that this may indeed be true in our field), but I am convinced that this study would have little of academic value to say on the matter. That said, having received this second email, I did nervously look back to my initial email to see what fake name I had failed to respond to.

The answer: Steven Smith. Yes! I ignored the name that could easily be a white male.

Update: For a different take on this study by Andrew Gelman (who, apparently, is more willing to spend time answering random emails than I am) see here, here, and here.

Posted in Politics, Rant | Tagged , , , | Leave a comment

A Preview of Barbados/Bellairs

This post is probably not so interesting unless you plan to travel to the Caribbean in a few weeks. The website for the conference is offline, so I thought I might update attendees on what might be happening, at least those who read my blog.

There are two hours of talks in the morning by me and two hours of talks in the evening. Warning: the paragraphs below are not necessarily in one-to-one correspondence with talks.

Part I: I will give an overview of the Taylor-Wiles method in something approaching its original formulation (so without Kisin’s modifications). I may give the circular proof of modularity for \(\mathrm{GL}(1)\) as an example. I will then start talking about modular forms of weight one. I will give the details of local-global compatibility as proved in my paper with David, first in the irreducible case, and then via a modification of this method in the general case (using results which will be in Joel Specter’s thesis).

Background I: Jared and Peter will give a background talk on the geometry of Shimura varieties, with an emphasis on the case of modular curves, and (possibly) also that of Siegel 3-folds (threefolds?).

Part II: I will introduce the general strategy developed by myself and David to prove modularity lifting in the \(\ell_0 = 1\) and \(\ell_0 = \ell_0\) situations, in particular, the details of our patching lemma. I will outline how the method naturally breaks up into several different constituent problems (constructing Galois representations, proving local-global compatibility, proving vanishing of cohomology outside certain ranges, representation theoretic problems arising from Taylor-Wiles primes). I will then apply these strategies to prove minimal modularity lifting theorems for weight one modular forms in the residually irreducible setting.

Background II: David(?) will talk about Kisin’s modification of the Taylor-Wiles method. Toby Gee will talk about how to prove local-global compatibility and what that means in a (somewhat) general setting.

Part III: I will discuss the geometry of local deformation rings for \(\mathrm{GL}(2)\). Topics to be covered here include classical questions of multiplicity one and two, as well as non-minimal modularity lifting theorems in weight one.

Background III: Sug Woo will discuss the relation between cohomology and automorphic forms and how the Eichler-Shimura isomorphism generalizes to higher dimensions. Jack Thorne will discuss Taylor-Wiles primes for \(\mathrm{GL}(n)\).

Part IV: I will talk about my work with David concerning minimal modularity lifting theorems for low weight Siegel modular forms. This will consist of generalizing some of the ingredients from \(\mathrm{GL}(2)\), such as local-global compatibility results, and vanishing results of Lan-Suh. I also discuss an approach to Taylor-Wiles primes in the torsion setting for \(\mathrm{GL}(n)\).

Part V: I will talk about completed cohomology in low degree. I shall explain my results with Matt on the stability of completed cohomology, and the computation of these groups using \(K\)-theory.

Related Research: I have asked a number of people to talk about their recent work on topics related to this conference. This includes George Boxer who has agreed to talk about coherent cohomology and generalized Hasse invariants, and Ila Varma who will talk about local-global compatibility for non-self-dual representations at \(\ell \ne p\).

Posted in Mathematics, Travel | Tagged , , , , , , , , | Leave a comment

The Decline of Western Civilization

I am often inclined on Saturdays to spend a few hours at the Brothers K cafe. I bring my laptop and some scratch paper, sip away on a cortado, and listen to music on my headphones as I work; it is a pleasant way to pass the time. That is, until the cafe instituted an “open mic night” on early Saturday evenings, which I unfortunately encountered last weekend. The entire enterprise seems more for the benefit of the very small number of people who run the show rather than for the actual patrons. As for me, one moment I was industriously working away and the next I was aurally assailed and had to make a hasty path to the exit. All this, mind you, while I was listening to the Dichterliebe. The horror! It’s enough to make one choke on one’s smoked salmon sandwiches.

One thing that makes the Dichterliebe so wonderful is the piano score, which transcends mere accompaniment. To mention just one example, I love how the piano finale in XVII is foreshadowed in part XII. It’s fun to play and fun to listen to; here is Fischer-Dieskau in an 1956 recording.

Posted in Music, Waffle | Tagged , , | Leave a comment

Are Galois deformation rings Cohen-Macaulay?

Hyman Bass once wrote a paper on the ubiquity of Gorenstein rings. The first time they arose in the context of Hecke algebras, however, was Barry’s Eisenstein ideal paper, where he proves (at prime level) that the completions \(\mathbf{T}_{\mathfrak{m}}\) are Gorenstein for all non-Eisenstein maximal ideals \(\mathfrak{m}\) of \(\mathbf{T}\) except possibly those which are ordinary of residual characteristic two. He also shows that the completions at Eisenstein primes are also Gorenstein, although this is trickier and makes fundamental use of the assumption that the level is prime. The Gorenstein property of various Hecke at non-Eisenstein maximal ideals was crucially used by Wiles to deduce non-minimal modularity lifting theorems. In the late 90’s, including around the time I started graduate school, it seemed as though all Hecke algebras in weight two were going to be Gorenstein (localized at non-Eisenstein ideals). One case remained, however, namely when \(\mathrm{char}(k) = 2\), and

\(\overline{\rho}: G_{\mathbf{Q}} \rightarrow \mathrm{GL}_2(k)\)

has the property that \(\overline{\rho}\) is unramified at \(2\) and, moreover, the image of Frobenius at \(2\) is a scalar. (The other cases having been dealt with by results of Mazur, Wiles, Ribet, and Buzzard.) But then it turned out, amazingly, that \(\mathbf{T}\) was not always Gorenstein. Lloyd Kilford found a counter-example at level \(N = 431\). The natural place to look, of course, is at \(\mathrm{GL}_2(\mathbf{F}_2) = S_3\)-representations. They have to come from a quadratic field \(K\) with class number divisible by three and such that \(2\) splits completely in the corresponding unramified degree three extension of \(K\). It also makes sense to work at prime level, because this will make computing the integral Hecke ring easier. The condition that \(2\) splits in \(K\) forces \(\Delta_K\) to be congruent to \(1 \mod 8\), which certainly means the class number is odd. The condition that \(2\) split in the corresponding cubic field is more subtle; if the class number of the field was \(3\), then this would be equivalent to the primes in \(K\) above \(2\) splitting principally in \(K\), but this can’t happen for norm reasons. So one has to start with a quadratic field \(K\) with \(\Delta_K \equiv 1 \mod 8\) and class number \(h = 3h’\) for some \(h’ > 1\), and such that the class given by \([\mathfrak{p}]\) for the prime above \(2\) does not generate the 3-Sylow subgroup. The smallest prime number with this property is … \(N = 431\). So it fails at the first opportunity! As Kevin once joked to me in a statement that sums up the best of all attitudes towards advising: “If I had known it was going to be that easy, I would have done it meself!”

Nowadays we know, at least in the analogous context when \(p\) is odd and we are in weight \(p\), that the appropriate Hecke algebras are Cohen-Macaulay. But we understand that the reason that these global Hecke algebras have these properties is because the *local* Hecke algebras have nice properties. The idea of deducing facts about the global Hecke algebra in the process of proving modularity lifting theorems started with Diamond, who found the first improvement to the Taylor-Wiles method. Essentially, given an \(R=\mathbf{T}\) theorem, one has a presentation of \(\mathbf{T}\) as a quotient of a (power series over a) local deformation ring by a sequence of parameters. If the local deformation rings are nice (Complete Intersections, Gorenstein, Cohen-Macaulay, etc.) then so is the global Hecke ring. Now this is only true in the contexts where \(\ell_0 = 0\); otherwise one is taking the quotient by “too many” relations (that is, not a sequence of parameters), and so there’s no longer any reason to expect that \(\mathbf{T}\) has those nice properties unless \(\ell_0 = 1\) and \(\mathbf{T}\) is finite.

So now we come to the question: are all local deformation rings Cohen-Macaulay? Well, perhaps there is not really any reason to suppose that they are. Perhaps even worse, there is a paper by Fabian Sander, a student of Vytas, proving that a certain deformation ring is not Cohen-Macaulay. But I am not deterred. My issue is that one has to take the correct deformation ring. And the correct deformation ring is the one that should include the extra data corresponding to the local Hecke operators which may not come (at an integral level) from the Galois representation.

To take a well known example, consider ordinary \(p\)-adic representations of weight \(p\). From a characteristic zero ordinary representation, one can always recover the (unique) eigenvalue of Frobenius on the unramified quotient. But this is not possible at the integral level, because (for example) \(\overline{\rho}\) could be locally trivial. This exactly corresponds to the fact that in weight \(p\), the Hecke operator \(T_p\) does not have to lie in the algebra generated by the other Hecke operators (the “anemic” Hecke algebra — was that term coined by Ken Ribet?). In order to prove modularity theorems, it usually suffices to work with the anemic Hecke algebra, but when one does include data which captures \(T_p\) (or \(U_p\)) the local deformation ring is (in this case) Cohen-Macaulay, as was shown by Snowden. So, for example, I would conjecture that the ordinary deformation ring (in any dimension) which includes the local Galois information corresponding to *all* the Hecke operators is Cohen-Macaulay.

Is there any real evidence for this guess besides the fact that it would be useful? Well, not really. But it would provide a systematic local Galois explanation for why deformation rings are torsion free, which is consistent with the guess that, appropriately defined, one should latex \(R = \mathbf{T}\) theorems on the nose, not just after looking at (say) \(\mathrm{MaxSpec}\). Of course, all of this is in the residually globally irreducible setting. Note that one reason to care about integral modularity statements is that most of the time, one would expect both \(R\) and \(\mathbf{T}\) to be torsion anyway.

Posted in Mathematics | Tagged , , , , , , , | Leave a comment

Robert Coleman

I was very sad to learn that, after a long illness with multiple sclerosis, Robert Coleman has just died.

Robert’s influence on mathematics is certainly obvious to all of us in the field. Most of my personal interaction with him was during my last two years as a graduate student at Berkeley. We would chat in his office, and sometimes have lunch at Nefeli caffe. Kevin and I had recently made some modest progress on Kevin’s crazy slope conjectures, and much of that time with Robert was spent with me presenting crazy ideas and predictions on the white board in Evans Hall while Robert looked on with his classic look of amused skepticism. There would also be the occasional wine and cheese in his office, especially if an old visitor was in town.

I certainly didn’t know him as well as many others did, but I felt very honored that he asked me to accompany him (as a grad student assistant) to China for his ICM address. As it happened, the relevant hotels in China would not allow him to bring Bishop (his guide dog) along with him, so he didn’t end up going.

Mathematically, Robert was very original. I have no plans to attempt to summarize his research, but I just want to discuss one problem which he had thought about in recent years, namely, what the eigencurve looked like at the boundary of weight space — especially in light of the description given by Kevin and Lloyd Kilford when \(N = 1\) and \(p = 2\). Suppose one is given a Fredholm determinant

\(\mathrm{det}(1 – U T) = P(T) = 1 + \sum_{n=1}^{\infty} a_n T^n\)

where \(a_n \in \Lambda = \mathbf{Z}_p[[X]]\), and one wants to understand the spectrum of \(U\) at the “boundary” of weight space, that is, when the valuation of \(X\) goes to zero. For example, an interesting collection of points near the boundary are the classical points with highly ramified nebentypus character. If \(a_n\) is not divisible by \(p\), then the valuation of \(a_n\) at a specialization of \(X\) close to one will co-incide with the valuation of the reduction mod-\(p\) of \(a_n\) as an element of the discrete valuation ring \(\mathbf{F}_p[[T]]\), that is, it will be determined by the smallest non-zero coefficient of \(a_n\) modulo \(p\). Robert’s idea was to study the “halo” of the eigencurve, which intuitively speaking, should be an object cut out by a compact operator \(U_{\chi}\) in characteristic \(p\) with characteristic power series \(P(X) \mod p\). If the valuations of the elements \(a_n(X) \mod p\) define a Newton Polygon \(N\), then the Newton Polygon at some point on the eigencurve which is sufficiently close to the boundary should be a simple multiple of \(N\). This is one of my favourite problems! I know Robert has some ideas on how to approach this problem, but unfortunately I don’t know exactly what they were or how much progress he had made. One natural question is whether this structure will ultimately be purely explainable in terms of \(p\)-adic local Langlands. One even more basic question is what happens numerically on components of the eigencurve corresponding to a representation \(\overline{\rho}\) which is absolutely irreducible after restriction to a decomposition group at \(p\); I presume one sees the same behavior, but has anyone checked this? Perhaps the easiest example to check would be to compute the slopes of forms on \(S_2(\Gamma_1(11 \cdot 2^n),\chi)\), where \(\chi\) has conductor \(2^n\).

Matt Baker has some further recollections of Robert here, and he also invites his readers to share there memories there.

Posted in Uncategorized | Tagged , , | Leave a comment

The Thick Diagonal

Suppose that \(F\) is an imaginary quadratic field. Suppose that \(\pi\) is a cuspidal automorphic form for \(\mathrm{GL}(2)/F\) of cohomological type, and let us suppose that it contributes to the cohomology group \(H^1(\Gamma,\mathbf{C})\) for some congruence subgroup \(\Gamma\) of \(\mathrm{GL}_2(\mathcal{O}_F)\). Choose a prime \(p\) which splits in \(F\) so that \(\pi\) is ordinary at \(v|p\). Hida proves that the corresponding cohomology class lives in a Hida family \(\mathcal{H}\) over the appropriate weight space, which in this case is (up to connected components) just \(\Lambda = \mathbf{Z}_p[[X,Y]]\). However, unlike the classical situation, this Hida family will not be flat, because the specialization to any local system which is not invariant under complex conjugation is necessarily finite. Thus the support \(D\) of \(\mathcal{H}\) has co-dimension at least one over \(\Lambda\). Hida proves that it does indeed have co-dimension one.

What does the support \(D\) of \(\mathcal{H}\) look like? Let us suppose that we are normalizing \(\Lambda\) so that the point \(X = Y = 0\) corresponds to \(\pi\). One can imagine two possibilities:

  1. \(D\) contains the diagonal \(\Delta: X = Y\).
  2. the components of \(D\) passing through \(X = Y = 0\) only contains finitely many classical points.

It seems as though these are the only possibilities. Certainly, by a Zariski closure argument, \(D\) either contains the diagonal \(\Delta\) or intersects it in finitely many points. Hence, it is true that if the first condition does not hold, then the components passing through \([0,0]\) contain only finitely many crystalline automorphic forms. However, there could be more classical points on \(D\), namely, those of parallel weight but non-parallel finite order nebentypus character. To be concrete, the possible points of \(\mathrm{Spec}(\Lambda)\) which may give rise to automorphic forms have (with some normalization) the following shape:

\(1+X \mapsto (1+p)^k \zeta, \qquad 1+Y \mapsto (1+p)^k \xi,\)

where \(\zeta\) and \(\xi\) are \(p\)-power roots of unity, and \(k\) is a non-negative integer. So one is really considering not simply the intersection of \(D\) with the diagonal \(\Delta\), but with the thick diagonal \({\Delta\kern-0.6em{\Delta}}\), which is the union of the infinitely many translates of \(\Delta\) by \(p\)-power roots of unity. In particular, the Zariski closure of \({\Delta\kern-0.6em{\Delta}}\) is all of weight space.

I wrote a paper with Barry Mazur where, as an illustrative example, we found an explicit Hida family which did not satisfy the first condition and claimed that it therefore satisfied the second, whereas we should only have made the weaker claim that \(D\) (which was irreducible in this particular case) contains only finitely many crystalline points. (The main point of the paper was, by studying infinitessimal deformations of Artin representations, to give evidence that \(D\) should only ever contain the diagonal when \(\pi\) is either a base change form or CM.) The error was pointed out to me by David Loeffler.

I am pleased to say, however, that my student Vlad Serban has overcome this error! Namely, suppose one has a non-trivial power series \(\Phi(X,Y) \in \mathbf{Z}_p[[X,Y]]\), and suppose that

\(\Phi((1+p)^k \zeta – 1,(1+p)^k \xi – 1) = 0\)

for infinitely many triples \((k,\zeta,\xi)\) with \(k\) a non-negative integer, and \(\zeta\), \(\xi\), \(p\)-power roots of unity. Let \(D\) be a component of the zero set \(\Phi(X,Y) = 0\) passing through \((0,0)\). Then, after possibly replacing the roles of \(X\) and \(Y\), Vlad proves the following. Either:

  1. \(D\) contains the diagonal \(\Delta\),
  2. \(\Phi(\zeta – 1, \zeta^N – 1) = 0\) for all \(p\)-power roots of unity \(\zeta\), for a fixed \(N \in \mathbf{Z}_p\).

Certainly the latter is possible, because one could have \(\Phi(X,Y) = (1+X)^N – (1+Y)\). In fact, he proves a more general theorem than this for all the components (not necessarily passing through \((0,0)\). After translation, this amounts to working over ramified extensions of \(\mathbf{Z}_p\).

This theorem allows one to prove (with finite computation) that any particular \(D\) only contains finitely many points (when that is true). It also shows, without any computation at all, that \(D\) either contains \(\Delta\), or it only contains finitely many classical points of weight different from \(\pi\). A nice way to think about this theorem is that it is of the flavour as the multiplicative Manin-Mumford conjecture. That is, one is intersecting a sub-variety with a particular arithmetically defined discrete set (inside \({\Delta\kern-0.6em{\Delta}}\)), and one wants to deduce that this can only happen for a well defined geometric reason. In fact, if one replaced \(\Phi(X,Y)\) by a polynomial with coefficients over \(\mathbf{C}\) and specialized to the case when \(k\) is always zero, then this would exactly be the Multiplicative Manin-Mumford conjecture in two dimensions.

As a special case, letting \(k = 0\), one ends up with the following pretty result. Suppose that \(\Phi(X,Y) \in \mathbf{Z}_p[[X,Y]]\) is a power series, and suppose that

\(\Phi(\zeta_1 – 1,\zeta_2 – 1) = 0\)

for infinitely many pairs of \(p\)-power roots of unity. Then the zero set of \(\Phi\) contains a translate of \(\mathbb{G}_m\). This exactly answers the puzzle asked by Jordan here. Explicitly, it says that the only quotients of \(\mathbf{Z}_p[[\mathbf{Z}^2_p]]\) of co-dimension one which have lots of “arithmetic” points really do come from a one-dimensional subgroup!

I think that this special case (with \(k = 0\)) is probably easier than the general case, because one has other methods available. The argument was, however, inspired by a result of Hida which came up during his last number theory seminar at Northwestern. Translated into the language of this post, Hida’s rigidity lemma corresponds to the puzzle of Jordan above in the case when \(\Phi(X,Y) = Y – F(X)\) for some function \(F(X) \in \mathbf{Z}_p[[X]]\).

Posted in Mathematics, Students | Tagged , , , , , | 6 Comments

The congruence subgroup property for thin groups.

I finally had a chance to visit Yale, which (by various orderings) is the fanciest US university at which I had never given a talk (nor even visited). The town itself struck me, at first, as a cross between Oxford and New Jersey. That aside, my coffee research led me to Blue State Coffee, which was more than up to the task of preparing a decent 8 ounce latte. (As a comparison, it is significantly better than Small World Coffee in Princeton. Small World has all the correct hipster attitude without enough of the corresponding aptitude.) Mathematically, I had a great chat with Hee Oh and Gregg Zuckerman over several hours. At one point, I raised the following idle question about thin groups.

Question: Let \(G = \mathrm{SL}_N(\mathbf{R})\) where \(N > 2\). Let \(\Gamma\) be an arithmetic lattice in \(G\). Suppose that \(\Phi \subset \Gamma\) is a subgroup such that the following two conditions are satisfied:

  1. The Zariski closure of \(\Phi\) in \(G\) is \(G\).
  2. The induced map of profinite completions: \(\widehat{\Phi} \rightarrow \widehat{\Gamma}\) is injective.

Then is \(\Phi\) necessarily of finite index in \(\Gamma\)?

If \(\Gamma = \mathrm{SL}_N(\mathbf{Z})\), then the first condition implies that the image of the induced map of profinite completions has finite index; I presume this is true more generally. Hence the question asked can be phrased as follows: “can congruence subgroups be determined by their pro-finite completions?” Alternatively, in the opposite direction, one can ask: “are there thin groups which satisfy the congruence subgroup property?” I have no particular reason to believe that the answer to the question above is positive, and I might even guess that one could write down a counter-example, but I don’t know how to write one down myself.

On the other hand, suppose that the answer to the question is positive. Then it might prove useful for determining whether, given a finitely presented group \(H:=\langle G \ | \ R \rangle\) and an explicit homomorphism:

\(\phi: H \rightarrow \Gamma\)

whether its image has finite index or (even more strongly) whether \(\phi\) is an isomorphism onto a finite index subgroup. Namely, if the image of \(\phi\) does not have finite index, then a positive answer to the question above would imply that \(H\) must have a finite quotient which does not come from \(\Gamma\), and (since finite quotients of \(H\) may be enumerated) this leads to an algorithm which terminates if \(\phi\) has infinite index. On the other hand, if \(H\) does have such a quotient, then certainly \(\phi\) will not be an isomorphism onto a finite index subgroup.

This problem explicitly came up in some work of Curt McMullen (see question 5.6 of this paper), who produced explicit maps of various finitely presented groups into lattices (not quite in \(\mathrm{SL}_N(\mathbf{R})\), but one can of course ask the more general question for lattices in semi-simple groups of rank at least two) and asked whether these maps were isomorphisms onto finite index subgroups. So the hope is that (in the contexts in which one expected the answer to be negative) this could always be answered by considering the pro-finite completion of the finitely presented group in question. Alas, I believe that I explicitly tried to find non-congruence quotients of the associated explicitly presented groups (in contexts where one expected \(\phi\) to have infinite index) and didn’t find any (not that I carried out this computation in anything approaching a sophisticated manner, of course).

Posted in Mathematics, Travel | Tagged , , , , , , | Leave a comment

Short thoughts on my visit to Berkeley

The marine biologists at Monterey Bay Aquarium give their octopuses hand massages. So do the fishmongers at Eataly.

It’s quite an experience to come face to face with this antediluvian monster: Monster

Are you allowed to turn right on red in California? I hope so, because I did so at every opportunity.

Almost mathematics is an endless source of almost humor.

Matt complained that I was not live blogging the hot topics conference from MSRI. But given that they record all the lectures, it seemed somewhat unnecessary.

The winter olympics are still on; don’t forget the greatest ever moment in winter olympic history (video in Dutch, but then Dutch sounds pretty much like English):

Does one ever tire of watching the sunsets in Berkeley?

Wine tasting with Ken in Arlington. Why do I not yet have a wine subscription from Kermit Lynch?

Heavyweight matchup: userxxx versus answer_bot live!

The pork burrito from Gordo Taqueria was simply the best burrito I have had for a very long time.

The initial letters of “perfectoid space” and “Peter Scholze” are both PS. Coincidence? You decide.

What foods do you miss from your native land (those of you who are immigrants)? Answers included sour cherries and instant custard.

A tip to anyone planning to visit wineries in Napa valley: don’t casually turn up around 5:00 only to find that many tasting rooms close before then.

The Maître d’ at Chez Panisse Cafe asked me to give my best wishes to Hendrik Lenstra.

Popular consensus was that the only way the conference could have been better was if Peter Scholze had given every talk.

Cheeseboard Pizza versus Sliver: Cheeseboard wins with a tapenade pizza.

Going to Berkeley and having lunch at MSRI is like going to the grand canyon and never leaving the visitor’s center.

Babette is the place to have lunch and hipster coffee near campus; hat-tip to Tony Wirth.

Kiran is happy that his research is now a hot topic.

Is “perfectoid” a portmanteau of “perfect” and “affinoid” or does it mean “somewhat perfect”?

And men of mathematics now-a-bed
Shall think themselves accurs’d they were not here,
And hold their manhoods cheap whiles any speaks
That thought with us upon this Pres’dents’ day.

Posted in Travel, Waffle | Tagged , , , , , | Leave a comment