Report from Luminy

For how long has Luminy been infested with bloodthirsty mosquitoes? The combination of mosquitoes in my room with the fact that my bed was 6′ long with a completely unnecessary headboard (which meant that I had to sleep on an angle with my ankles exposed) did not end well.

As for the math, there were plenty of interesting talks, most of which I will not discuss here. Jan Nekovar gave a nice talk explaining how one could prove that the cohomology of compact Shimura varieties of \(\mathrm{GL}(2)\)-type were semi-simple. For concreteness, imagine that \(X\) is a Hilbert modular surface associated to a real quadratic field \(F\). Suppose that \(\rho\) is the Galois representation associated to a cuspidal Hilbert modular form of parallel weight two. Then the Langlands-Kottwitz method shows that the semi-simplification of \(\rho^{\otimes 2}\) should occur inside \(H^2\). On the other hand, this argument only ever deals with the trace of Hecke operators and so cannot say anything about semi-simplifications. Nekovar’s argument is to use the Eichler-Shimura relation applied to partial Hecke operators for primes which split completely in both \(F\) and the corresponding reflex field. The point is that these operators satisfy a quadratic relation (with distinct eigenvalues for generic elements of the Galois group), and so act semi-simply on \(H^2\) (imagine everything is compact here). Then, by pure group theory, if the image of \(\rho\) is large enough, the sheer number of such elements is enough to force semi-simplicity. It is perhaps useful to note that if \(V\) is a representation such that \(V^{ss} = (\rho^{\otimes 2})^{ss}\) and \(V\) is geometric and pure, then \(V\) should automatically be semi-simple. This follows from any number of combinations of bits of the standard conjectures, but one way to see it is that if \(W\) is geometric and of weight zero, then (by Bloch-Kato) one should have \(H^1_f(F,W) = 0\). The relevant \(W\) in the example above is \(\mathrm{ad}^0(\rho)\). So in fact one can give an alternate proof of the theorem using the full power of modularity lifting theorems, providing one is willing to omit finitely many primes \(p\). This is really an explanation of why Jan’s result is nice! For example, as soon as one replaces \(\rho^{\otimes 2}\) by \(\rho^{\otimes n}\), one has to start dealing with \(H^1_f(F,\mathrm{Sym}^{2n}(\rho)(-n))\), which gets a little tricky.

Boars
These fellows turned up for the Bouillabaisse.

Ana Caraiani talked about a very nice result concerning the sign of Galois representations associated to torsion classes for \(\mathrm{GL}(n)/F^{+}\) for totally real fields \(F^{+}\) (this was joint work with Bao Le Hung). Namely, the trace of any complex conjugation lies in \(\{-1,0,1\}\) (in fact, the result identified the exact characteristic polynomial, which is more general in small characteristic). The basic strategy is to follow Scholze’s construction and “reduce” the problem to the case of essentially self-dual forms, where one has previous results by Taylor, Bellaïche-Chenevier, and Taïbi. However, there is a problem, which is that the regular self-dual automorphic forms one finds congruences with need not be globally irreducible, and perhaps not even cuspidal. Suppose one can show that they decompose into an isobaric sum \(\pi = \boxplus \pi_i\) where the \(\pi_i\) are self-dual. One runs into problems if too many of the \(\pi_i\) are of dimension \(n_i\) with \(n_i\) odd. However, by considering the weights, only one of the \(\pi_i\) can be odd, because otherwise the Hodge-Tate weight zero would occur with multiplicity which would violate the fact that \(\pi\) is regular. There is still something to check for the even \(n_i\) also, because previous results required some sign assumption on the character \(\eta\) such that \(\pi = \pi^{\vee} \eta\). I believe that even getting to this point required a further assumption on the torsion class not coming from the boundary. In the boundary case, there was also a reduction/induction case which also required careful handling of “odd” dimensional pieces, and some computation of a restriction of Hecke operators from the relevant Parabolic/Levi which required a sign to come out correctly. One clever technical step was working with the cohomology of adelic quotients \(G(F)\backslash G(\mathbf{A})/UK\) where \(K\) is a maximal compact of \(G(\mathbf{R})\) rather than the connected component \(K^0\). The advantage of this is that, in the odd dimensional case, this pins down the trace of complex conjugation to be \(+1\) rather than \(\pm 1\). This is clear when \(n = 1\), and that one should expect it to be true follows for \(n\) odd by taking determinants.

Peter Scholze gave a talk on his new functor. The basic elements in the construction of this functor are as follows. The Gross-Hopkins period map allows one to view the (infinite level) Lubin-Tate tower as a \(\mathrm{GL}_n(F)\)-torsor over the (\(D^{\times}\) Severi-Brauer variety) \(\mathbf{P}^{n-1}_{\mathbf{C}_p}\). So, given an admissible representation \(\pi\), one can form the “local system” \(\mathscr{F}_{\pi}\) on the base, and then take its cohomology. The key technical point of this construction is to show that the result is admissible for \(D^{\times}\), which amounts to proving finiteness of \(K\)-invariants for suitable compact open \(K\) of \(D^{\times}\). The first step is to pull back to the (lowest level) part of the Lubin-Tate tower, which one can do because the GH map splits. Now the map from infinite level to the base of the Lubin-Tate tower is really a \(\mathrm{GL}_n(\mathcal{O}_F)\)-torsor, so one only has to consider the restriction of \(\pi\) to \(\mathrm{GL}_n(\mathcal{O}_F)\). But then using the admissibility of \(\pi\), one can look instead at the regular representation of \(\mathrm{GL}_n(\mathcal{O}_F)\). Now, by some sort of Shapiro’s Lemma, one can pull everything back up to infinite level. At infinite level, however, one can replace the Lubin-Tate space by the corresponding Drinfeld tower. Now taking \(K\)-invariants is something that is “easy” to do, because there is an action of \(K\) on the space, and the quotient by \(K\) is some sufficiently nice object for which one has (again by Peter) some nice finiteness theorems for cohomology. I should probably have mentioned that at some point we are working with coefficients in \(\mathcal{O}^+/p\), i.e. in the almost world. The main application in the talk was to show that when one patches completed cohomology (a la CEGGPS), then one can recover the Galois representation from the result. This essentially amounts to showing that when one patches together suitable admissible \(\pi_i\), one can also patch the functor. This requires more than admissibility of the functor, but some sort of “uniform” admissibility (which is always required for patching). I think the key point here is that if \(\pi_i\) is something patched with a group of diamond operators \(\Delta\), then \(\pi_i\) has a filtration by \(|\Delta|\) copies of the original \(\pi\), and so \(\mathscr{F}_{\pi_i}\) has a corresponding uniformly bounded filtration by \(\mathscr{F}_{\pi}\), and so \(H^{n-1}(\mathbf{P}^{n-1}_{\mathbf{C}_p},\mathscr{F}_{\pi})^K\) has length at most \(|\Delta|\) times the corresponding length for the (fixed for all time) version for \(\pi\). On the other hand, Peter instead pulled out a new piece of kit by patching using ultra-filters. My own feeling about logic is that it is never really necessary to prove anything, and I think PS agreed that it wasn’t strictly required for this particular application. Now I understand that my prejudice may not be justified (for example, it is probably hard to prove various identities concerning orbital integrals in small characteristic directly), but I think it applies in this case. Plus, as a purely expositional remark, if you are going to whip out ultrafilters during a number theory talk then everyone is just going to talk about ultrafilters rather than the beautiful construction!

Posted in Mathematics, Uncategorized | Tagged , , , , , , , | Leave a comment

An Obvious Claim

It’s been a while since I saw Serre’s “how to write mathematics badly” lecture, but I’m pretty sure there would have been something about the dangers of using the word “obvious.” After all, if something really is obvious, then it shouldn’t be too difficult to explain why. It is especially embarrassing when someone asks you to clarify a remark/claim in one of your papers which you claim is “obvious” and you find yourself having no idea what the implicit argument was supposed to be. Such a thing happened recently to me, when Toby asked me to explain why the following was true:

Claim: Let \(N \equiv 3 \mod 4\) be prime, and let \(\epsilon\) be the fundamental unit of \(K = \mathbf{Q}(\sqrt{N})\). Then \(\epsilon = a + b \sqrt{N}\) where \(a\) is even and \(b\) is odd.

Proof of Claim: Between Toby, Kevin, and myself, we managed to come up with the argument below, following a suggestion of Toby Rebecca Bellovin: It’s easy enough to see (obvious) that \(a\) and \(b\) are integers and \(N(\epsilon) = 1\). Hence, it suffices to rule out the case that \(b\) even and \(a\) odd. Write \(a^2 – N b^2 = 1\). It follows that \(a^2 \equiv 1 \mod N\), and since \(N\) is prime, that \(a \equiv \pm 1 \mod N\). Assuming that \(a\) is odd, write \(a = 2NA \pm 1\), and \(b = 2B\). Then the equation above becomes

\(A(NA \pm 1) = B^2.\)

Without loss of generality, assume that \(A\) is positive. Then this equation implies that \(A\) and \(NA \pm 1\) are squares, say \(A = d^2\) and \(NA \pm 1 = c^2\). But then

\(c^2 – N d^2 = (NA \pm 1) – N A = \pm 1,\)

and hence \(\eta = c + N \sqrt{d}\) is a (smaller) unit (in fact, \(\eta^2 = \pm \epsilon\)), contradicting the assumption that \(\epsilon\) was a fundamental unit. \(\quad \square\)

This argument is really a 2-descent on the unit group. As Kevin remarked: “So this is a descent argument in a completely elementary situation which I don’t think I’d ever seen before and which proves something that I don’t think I knew … What’s ridiculous is that if the equation had been a cubic and we were after rational solutions then I would have instantly leapt on descent as one of my main tools for attacking it :-/ We live and learn!”

So what was I thinking when I wrote the paper? The actual claim in the paper is this: “If \(H’\) is the (2 part of the) strict ray class group of \(K\) of conductor \((2)\), then \(H = H’\), where \(H\) is the (2 part of the) class group. The “argument” is as follows:

The proof of [the above] is even more straightforward: it follows immediately from a consideration of the units in \(\mathcal{O}^{\times}_K\) and the exact sequence

\(\mathcal{O}^{\times}_K \rightarrow (\mathcal{O}_K/2 \mathcal{O}_K)^{\times} \rightarrow H’ \rightarrow H \rightarrow 0.\)

Well, at least the word obvious was only implicit here. I could try to place the blame on my co-author Matt here, but honestly the phrasing of the claim does sound a little like something I would write.

Next up: a report from Luminy!

Posted in Mathematics | Tagged , , , , | Leave a comment

Huuuuuge piles of cash

As widely reported today, the first of the “Breakthrough” prizes in mathematics have been announced. Leaving aside the question as to whether such awards are sensible (Persiflage is more sympathetic to capitalist principles than your average pinko marxist mathematician), I think we should all be happy that the inaugural awardees are beyond reproach, something which could not have been taken for granted. So well done to whoever was on the prize committee.

Of the awardees that I know personally, my impression is that, at least in part, they would feel mildly embarrassed by the large amount of cash involved, even if they (presumably) feel gratified by the deserved acknowledgement of their contributions to mathematics. I do hear, however, that a bottle of Chateau d’Yquem 1967 does wonders to wash away any last remaining vestiges of embarrassment, and I am more than willing to help out with the consumption of said beverage if required.

Posted in Mathematics, Politics | Tagged , | Leave a comment

There are non-liftable weight one forms modulo p for any p

Let \(p\) be any prime. In this post, we show that there is an integer \(N\) prime to \(p\) such that \(H^1(X_1(N),\omega_{\mathbf{Z}})\) has a torsion class of order \(p\). Almost equivalently, there exists a Katz modular form of level \(N\) and weight one over \(\mathbf{F}_p\) which does not lift to characteristic zero. We shall give two different arguments. The first argument will have the virtue that the torsion class is non-trivial after localization at a maximal ideal \(\mathfrak{m}\) which is new of level \(N\). The second argument, in contrast, will produce torsion classes at fairly explicit levels. Neither proof, unfortunately, implies the existence of interesting Galois representations unramified at \(p\) with image containing \(\mathrm{SL}_2(\mathbf{F}_p)\). Rather, the classes will come from deformations of characteristic zero classes. (This post is an elaboration of my comment here.)

The first argument: Let \(K/\mathbf{Q}\) be an imaginary cubic extension unramified outside \(p\) with Galois closure \(L/\mathbf{Q}\) with Galois group \(S_3\). There is a corresponding Galois representation:

$latex \rho: G_{\mathbf{Q}} \rightarrow \mathrm{Gal}(L/\mathbf{Q}) = S_3
\rightarrow \mathrm{GL}_2(\mathbf{Q}_p).$

This representation is modular. Suppose for convenience that \(p > 3\). Associated to \(\rho\) is an absolutely irreducible residual representation \(\overline{\rho}\). Let \(R\) denote the corresponding universal unramified deformation. The only characteristic zero deformations are dihedral. Let \(R^{\mathrm{dh}}\) denote the corresponding universal unramified dihedral deformation ring. It’s easy to identify this ring explicitly; it is

\( R^{\mathrm{dh}} = \mathbf{Z}_p[C_E \otimes \mathbf{Z}_p],\)

where \(C_E\) is the class group of the imaginary quadratic subfield \(E\) of \(L\). The ring \(R\) will fail to be \(\mathbf{Z}_p\)-flat exactly when \(R \ne R^{\mathrm{dh}}\). Fortunately, this can be determined purely from the reduced tangent space of \(R\). Note that

\(\mathrm{ad}^0(\rho) \simeq \rho \oplus \eta,\)

where \(\eta\) is the quadratic character of \(E/\mathbf{Q}\). The reduced tangent space of \( R^{\mathrm{dh}}\) is the Bloch–Kato Selmer group \(H^1_f(\mathbf{Q},\overline{\eta})\), where \(H^1_f\) denotes the subring of cohomology classes which are unramified everywhere. So it all comes down to finding \(K\) so that \(H^1_f(\mathbf{Q},\overline{\rho})\) is non-zero. However, an elementary argument using inflation-restriction shows that this is equivalent to showing that the class number \(h_K\) of \(K\) is divisible by \(p\). So we are done provided that we can find a suitable \(K\) with class number divisible by \(p\). (I should mention, of course, that we are using the theorem that \(R = \mathbf{T}_{\mathfrak{m}}\) which was proved by David Geraghty and me.) The last step follows from the lemma below; the argument is essentially taken from this paper of Bilu–Luca.

Lemma: Fix a prime \(p > 3\). There exists an imaginary cubic field \(K/\mathbf{Q}\) of
discriminant prime to \(p\) and class number divisible by \(p\).

Proof: Consider the field \(K = \mathbf{Q}(\theta)\), where

\((\theta^2 + 1)(\theta – t^p + 1) – 1 = 0\),

and \(t\) is an element of \(\mathbf{Q}\) to be chosen later. Note that \(\theta^2 + 1\) is manifestly a unit in \(K\). We may compute that

\((\theta^2 + \theta + 1) \theta = (1 + \theta^2) t^p.\)

Since \((\theta,\theta^2 + \theta + 1) = (1)\) is trivial, it follows that \((\theta) = \mathfrak{a}^p\) for some ideal \(\mathfrak{a}\). We shall show that, for a suitably chosen \(t\), the element \(\mathfrak{a}\) is non-trivial in the class group. If \(\mathfrak{a}\) is trivial, then, up to a unit, \(\theta\) is a \(p\)th power. On the other hand, the rank of the unit group of \(K\) is one, and \(\theta^2 + 1\) is a unit. Hence, it suffices to choose a \(t\) such that:

  1. \(\theta^2 + 1\) generates a subgroup of \(\mathcal{O}^{\times}_K\) of index prime to \(p\). Equivalently, \(\theta^2 + 1\) is not a perfect \(p\)th power in \(K\).
  2. None of the elements \(\theta (\theta^2 + 1)^i\) for \(i = 0,\ldots,p-1\) is a perfect \(p\)th power in \(K\).
  3. The polynomial defining \(K\) is irreducible.
  4. The discriminant of \(K\) is not a square.
  5. The discriminant of \(K\) is prime to \(p\).

By working over the function field \(\mathbf{Q}(t)\) instead of \(\mathbf{Q}\), one finds that the first four conditions hold for all \(t \in \mathbf{Q}\) outside a thin set. (The discriminant \(\Delta\) is always negative, so the signature of the field is always \((1,1)\).) On the other hand, the discriminant of the defining polynomial is \(-3 \mod t\), so if one (for example) takes \(t\) to be an integer divisible by \(p\) then the discriminant will be prime to \(p\). Note that the set of integers divisible by \(p\) will contain elements not in any thin set, because the number of integral points of height at most \(H\) in a thin set is \(o(H)\).

Second Argument: Let \(E = \mathbf{Q}(\sqrt{-23})\), and let \(L = E[\theta]/(\theta^3 – \theta + 1)\) be the Hilbert class field of \(E\). There is a weight one modular form of level \(\Gamma_1(23)\) and quadratic character corresponding to the Galois representation:

\(\rho: \mathrm{Gal}(L/\mathbf{Q}) \rightarrow \mathrm{GL}_2(\mathbf{Q}_p).\)

Lemma: Let \(p > 3\). Let \(q = x^2 + 23 y^2\) be a prime such that \(q \equiv 1 \mod p\). Equivalently, let \(q\) be a prime which splits completely in \(L(\zeta_p)\). Then

\(\# H^1(X(\Gamma_1(23) \cap \Gamma_0(q)),\omega)^{\mathrm{tors}}\)

is divisible by \(p\). More generally, for any prime \(q\), the quantity above is divisible by any prime divisor of \(a^2_q – (1+q)^2\), and \(a_{\ell} \in \{2,0,-1\}\) for a prime \(\ell\) is the coefficient of \(q^{\ell}\) in \(q \prod (1-q^n) (1 – q^{23 n})\).

Proof: This follows from “level–raising” in characteristic \(p\) for weight one forms. Under the hypothesis that \(a^2_q – (1+q)^2\), we find that there is more cohomology (over \(\mathbf{Z}_p\)) in level \(\Gamma_1(23) \cap \Gamma_0(q)\) than is accounted for by oldforms. Assuming that there is no torsion, this is inconsistent with the fact that there are no newforms in characteristic zero, because weight one forms cannot be Steinberg at any place. (The easiest way to see this is that the eigenvalue of \(U_q\) would have to be non-integral — it also follows on the Galois side from local-global compatibility, but this is overkill.) Note that level–raising in this context does not follow from classical level–raising — for the details I refer to you my fifth lecture in Barbados on non-minimal modularity lifting theorems in weight one.

Posted in Mathematics | Tagged , , , | Leave a comment

Math and Genius

Jordan Ellenberg makes a compelling case, as usual, on the pernicious cultural notion of “genius.” Jordan’s article also brought to mind a thought provoking piece on genius by Moon Duchin here (full disclosure: the link on Duchin’s website has the folowing caveat: note: juvenilia! Not that I hate it).

I should first clarify that I am essentially in agreement with many of Jordan’s points, as well as his implicit and explicit recommendations. That said, there is a secondary argument which portrays mathematics as a grand collective endeavour to which we can all contribute. I think that this is a little unrealistic. In my perspective, the actual number of people who are advancing mathematics in any genuine sense is very low. This is not to say that there aren’t quite a number of people doing *interesting* mathematics. But it’s not so clear the extent to which the discovery of conceptual breakthroughs is contingent on others first making incremental progress. This may sound like a depressing view of mathematics, but I don’t find it so. Merely to be an observer in the progress of number theory is enough for me — I know how to prove Fermat’s Last Theorem, how exciting is that*?

Having said this, there are two further points on which I do agree with Jordan. The first is that it is a terrible idea to *prejudge* who will be the ones to make progress, at least any more than necessary (judgements of various levels are completely built in to the academic world in the context of hiring, grants, etc.). This is consistent with Jordan’s point that one should view accomplishments rather than people as genius. The second is that the mere notion of genius has detrimental psychological effects on mathematicians. The depressing effect on enthusiasm for mathematics amongst students has already been mentioned, but I also wanted to note a contrasting effect. There’s a tendency amongst a certain group of students — for concreteness imagine a certain flavour of Harvard male undergraduate — who buys into the notion of genius with themselves as an example. This doesn’t do any favours for anyone; obviously not fellow students who are put off, but also the students themselves who are tempted to learn fancy machinery —- that’s what a genius does! — rather than to really understand the basics. Two clarifying points: first, this in no way applies to all Harvard undergraduates — some of the smartest I have met have been very humble and level headed. Second, when I single out Harvard undergraduates, I am implying a correlation rather than a causation: this is not at all the fault of the Harvard faculty who are generally no nonsense. Then again, Harvard does have Math 55, which is somewhat of a petri dish for this sort of attitude.

Having written this, I now see there is a practical problem in the classroom which I don’t know how to solve. What do you do when you are teaching and one student sticks out like a sore thumb because their understanding of the material (seemingly) far exceeds the rest of the class?

(*) OK, I have not read all the details of the proof of cyclic base change for \(\mathrm{GL}(2)\).

Posted in Mathematics, Politics, Waffle | Tagged , , , | Leave a comment

I don’t know how to prove Serre’s conjecture.

I find it slightly annoying that I don’t know how to prove Serre’s conjecture for imaginary quadratic fields. In particular, I don’t even see any particularly good strategy for showing that a surjective Galois representation — say finite flat with cyclotomic determinant for \(v|p\) —

\(\overline{\rho}: G_{F} \rightarrow \mathrm{GL}_2(\mathbf{F}_3)\)

is modular of the right level. The first problem is that the strategy used by Wiles does not work. The results of Langlands-Tunnell imply the existence of an automorphic form \(\pi\) for \(\mathrm{GL}(2)/F\) which has an associated finite image Galois representation into \(\mathrm{GL}_2(\mathbf{Z}[\sqrt{-2}])\) with projective image \(A_4\) that is “congruent” to \(\overline{\rho}\) modulo a prime above \(3\), but there is no way to realize this congruence in cohomology. An analogous example over \(\mathbf{Q}\) would be that the (known) modularity of a surjective even Galois representation:

\(\overline{\rho}: G_{\mathbf{Q}} \rightarrow \mathrm{SL}_2(\mathbf{F}_4) = A_5\)

has no implications for the modularity of the corresponding even complex representation with projective image \(A_5\) (which is “congruent” modulo \(2\)), because there is no way to relate them via Betti or coherent cohomology.

One context in which we have a fairly satisfactory answer to Serre’s conjecture over imaginary quadratic fields is for representations \(\overline{\rho}\) which are the restriction of an odd representation of \(G_{\mathbf{Q}}\). (I guess one also has modularity in some CM cases, that is, inductions from CM extensions \(H/F\).) So, if we give ourselves modularity lifting results (surely a requirement to get anywhere), one could imagine trying to play some sort of game using the \(3\)-\(5\) switch to construct a chain between a representation which comes from \(\mathbf{Q}\) and the target representation. Or, perhaps, one can play the \(3\)-\(3\) game using abelian surfaces with real multiplication by \(\mathbf{Z}[\sqrt{7}]\). However, there’s a big hole in this strategy: the \(3\)-\(5\) game presupposes that once you know that \(\overline{\rho}\) is modular of some level, you know it at minimal level. So now one runs into the problem of level lowering. Alternatively, if you want to play the \(3\)-\(5\) game Khare–Wintenberger style, you really have to construct minimal lifts. But such lifts will not (in general) exist over imaginary quadratic fields.

This seems to be a serious problem. The only general strategies I can imagine involve being able to push the torsion classes around to different groups using some (as yet unknown) functoriality for torsion classes. (For example, find minimal lifts over some large CM extension \(F’/F\), prove modularity over \(F’\), and then invoke non-abelian base change for torsion classes to recover modularity of the original representation.) The other argument would be to examine the corresponding Eisenstein classes for \(U(2,2)/F\). This seems a little fishy, however; one would really want to see these representations inside (etale) cohomology in order to invoke some kind of Mazur principle, but as we have noted previously, the Galois representations of interest don’t actually live inside the etale cohomology groups that one might want them to. Ultimately, the basic problem is that the classical (Mazur-Ribet) style arguments make strong use the geometry of modular curves (which is certainly missing here) and the more modern approaches (starting with Skinner–Wiles) rely on base change.

Posted in Mathematics | Tagged , , , , | Leave a comment

Thurston, Selberg, and Random Polynomials, Part II.

What is the probability that the largest root of a polynomial is real?

Naturally enough, this depends on how one models a random polynomial. If we take polynomials of degree N which are constrained to have all of their roots to be of absolute value at most one (with respect to the normalized Lebesgue measure on \(\mathbf{R}^N\)), then, as mentioned last time, the probability that the largest root is real is either \(1/N\) in odd degree \(N\) and \(1/(N-1)\) in even degree. A priori, this seems surprisingly small. However, the roots of such polynomials are accumulating on the unit circle, and it’s easier for complex roots to be near the unit circle than real roots. So let’s instead consider the Kac model of polynomials \(f(x)\), where the coefficients are chosen to be independent normals with mean zero. If you ask for the probability that the root whose absolute value is closest to one is real, then I suspect that the answer will be approximately \(1/N\). However, what about the largest root? The first observation is that the expected number of real roots is \(2/\pi \log N\), so a the most naïve guess is that the probability that the largest root is real is approximately \((2/\pi N) \log N\). If you like, you can pause here and guess whether you think this is too high, too low, or about right.

A useful observation is that, instead of considering the largest root, we can consider the smallest root. This is because the map \(a_k \rightarrow a_{N-k}\) is measure preserving and inverts the roots. On the other hand, the behavior of random Kac polynomials in large degree inside the unit circle starts to approximate the behavior of random power series

\(f(x) = a_0 + a_1 x + a_2 x^2 + \ldots \)

where the \(a_i\) are all normally distributed with mean zero and standard deviation one. It’s easy to see that \(f(x)\) will have radius of convergence \(1\) with probability one. So we might instead consider what the probability is that the smallest root of a random power series is real. However, in this case, it is quite elementary to see that this probability \(P_{\infty}\) is strictly between zero and one. Quite explicitly, consider the subspace of power series such that the following inequality holds:

\(\displaystyle{|a_0| + \frac{1}{2} |a_1 – 2| + \frac{1}{2^2} |a_2| + \frac{1}{2^3} |a_3| + \ldots < 1}.\)

This region has positive measure (easy exercise). On the other hand, for all such power series, one can apply Rouché's theorem for the contour \(|2x| = 1\) to see that \(f(x)\) and \(2x\) have the same number of zeroes inside this disc, and hence \(f(x)\) has exactly one root of absolute value less than \(1/2\). By the reflection principle, this root is real. It follows that the probability that the smallest root of \(f(x)\) is real is positive. Equally, one can consider the region:

\(\displaystyle{|a_0 – 1| + \frac{1}{2} |a_1| + \frac{1}{2^2} |a_2 – 8| + \frac{1}{2^3} |a_3| + \ldots < 1},\)

and by applying Rouché along \(|2x| = 1\) and comparing with \(1 + 8 x^2\), the corresponding \(f(x)\) will have exactly two roots inside this ball, and from the inequalities above it follows that neither of them will be real, and hence \(P_{\infty} < 1\).

The same argument shows that if \(P_N\) is the probability that the smallest (or largest) root of a Kac polynomial is real, then there are uniform (independent of \(N\)) estimates \(0 < a < P_N < b < 1\) for all \(N\). Naturally enough, one should expect that \(P_N\) converges to \(P_{\infty}\). This is true, and the rough idea is to show that, with probability approaching one (as \(N \rightarrow \infty\)), one can apply Rouché’s theorem to deduce that the smallest root of \(f(x)\) is real if and only if the smallest roots of its truncation \(f_N(x)\) is real. The key idea here is that, for the truncation \(f_N(x)\), most of the roots of \(f_N(x)\) will be uniformly distributed along the unit circle, and so the contribution of the relevant factor \(\prod |x – \alpha|\) to \(f_N(x)\) will not be too small. Hence one can usually apply Rouché along the contour \(|x| = \beta\) as long as there are no roots of \(f_N(x)\) of absolute value too close to \(\beta\).

The computations above also allow one to give effective gaps between \(P_{\infty}\) and either zero or one (by estimating the measure of the corresponding regions as translates of \(|a_i| \le 1/2^{i+1}\)), although these estimates are not so sharp. Namely, the probability that the smallest root of a random power series is real is at least 0.256% and at most 99.999999999999917%. Some numerical data suggests, however, that the probability that the largest root of a random Kac polynomial (of large degree) will be real is approximately 52%. I have some undergraduates working with me this summer, and one of their projects will be to see if they can prove that the probability is really strictly larger than 50%, or at least to find a good an estimate as they can.

One may ask what happens for other ensembles of polynomials. One natural class to consider is the so-called binomial polynomials, where the \(a_i\) are now normal with mean zero and variance \( n!/i!(n-i)!\). Here the previous argument doesn't (a priori) work. On the other hand, as Boris Hanin (a Steve Zelditch student from Northwestern who is leaving for a postdoc at MIT next year) pointed out to me, it actually does: to fix it, one should scale all the roots of the relevant polynomials by \(\sqrt{N}\), and then there really is a limit distribution as \(N \rightarrow \infty\), given by power series

\(f(x) = \displaystyle{\frac{a_0}{0!} + \frac{a_1 x}{\sqrt{1!}} + \frac{a_2 x^2}{\sqrt{2!}} + \ldots} \)

where the normalized \(a_i\) are normals with standard deviation one. Note that these power series have an infinite radius of convergence with probability one. The probability that the smallest root is real will once again be strictly between \(0\) and \(1\). In order to prove convergence of \(P_N\) (by applying Rouché’s theorem), one needs to know that the relevant 2-point correlation functions behave reasonably enough; I’m hoping to get Boris to work out and write down the details here. Numerically, the limit probability in this case is somewhere around 62%.

Gap Probabilities:

I speculated last time on some conjectural relationship between the space of real monic polynomials \(\Omega_N\) all of whose roots are at most one, and the space of random Kac polynomials of degree \(N\) as \(N\) goes to infinity. But now I wanted to point out a more direct an elementary relationship between ensembles of random real polynomials and our space \(\Omega_N\). A gap probability is the probability that the eigenvalues/roots of some ensemble avoid some region of the corresponding parameter space. Let’s compute this for a very large gap. That is, let’s compute the probability that a random polynomial has all of its roots less than \(T\) as \(T \rightarrow 0\).

Let's consider the Kac model of random polynomials

\(f(x) = a_0 x^N + a_1 x^{N-1} + \ldots + a_N\)

where the \(a_i\) are chosen independently from a normal distribution with mean zero and standard deviation one. Hence we are asking: what is the probability that all the roots of \(f(x)\) have absolute value at most \(T\)? This is simply the integral

\(\displaystyle{\left(\frac{1}{\sqrt{2 \pi}}\right)^{N+1} \int_{P \Omega_N(T)} e^{-|x|^2/2} dx}\)

where \(P \Omega_N(T)\) is the space of polynomials (not necessarily monic) all of whose roots are at most \(T\). There is a map

\(\Omega_N(T) \times [-\infty,\infty] \rightarrow P \Omega_N\)

given by \((\lambda,P) \mapsto \lambda P\) with Jacobian \(|\lambda|^{N+1}\). Hence we can write our quantity as an integral over \(\Omega_N(T)\), which turns out to be

\(\displaystyle{\left(\frac{1}{\sqrt{2 \pi}}\right)^{N+1} \int^{\infty}_{-\infty} \int_{ \Omega_N(T)} |\lambda|^{N+1} e^{-(\lambda^2 – a^2_1 \lambda^2 – \ldots – a^2_N \lambda^2)/2} dx}.\)

We can now compute the integral over \(\lambda\) directly, and then scaling \(\Omega_N(T)\) to \(\Omega_N\) in the usual way, we find that the probability that all the roots have absolute value \(T\) is

\(\displaystyle{ T^{N(N+1)/2} \left(\frac{2}{\pi}\right)^{(N+1)/2} \Gamma(N/2+1) \int_{\Omega_N} \frac{dV}{(1 + T^2 a^2_1 + T^4 a^2_2 + \ldots + T^{2N} a^2_{N})^{N/2+1}}}\)

Now suppose that \(T \rightarrow 0\). Then the integral over \(\Omega_N\) converges to the volume of \(\Omega_N\), and we obtain an exact asymptotic that all the roots are (highly) concentrated at zero. In fact, one can do this computation with any probability measure \(\mu\) which decays sufficiently at infinity.

Curiously enough, we can also ask (in the setting of random polynomials subject to some reasonable measure \(\mu\) for each \(i\)) what the probability is that a random polynomial has \(R\) real roots contingent on all the roots of that polynomial being less than \(T\). It turns out that, as \(T \rightarrow 0\), the answer in this case is simply the ratio of the volume of \(\Omega_{R,S}\) to \(\Omega_N\) (with \(R+2S=N\)). This answer does not depend at all on \(\mu\). The explanation for this is that, having subjected the polynomials to the constraint that all the roots have absolute value at most \(T\) for small \(T\), one is restricting to some tiny region where the measure is constant, and so it is converging to a scaled version of Lebesgue measure.

Posted in Mathematics | Tagged , , , , | 1 Comment

Thurston, Selberg, and Random Polynomials, Part I.

Apart from everything else, you could always count on Bill Thurston to ask interesting questions. This is the first of a small number of posts which were motivated in part by figure two from this paper, and this accompanying MO question. I liked this problem enough to give it as a thesis problem to my student Zili Huang, and much of what I discuss below arose from this project.

Say that an algebraic integer \(\alpha\) is Perron if \(|\alpha| > |\sigma \alpha|\) for every conjugate \(\sigma \alpha\) of \(\alpha\). One immediately observes that \(\alpha\) must be real. Say that a monic polynomial is Perron if it is irreducible and has a Perron integer as a root. Thurston’s question is (roughly) to describe the distribution of Perron algebraic integers, especially those chosen in some (small) fixed interval in \(\mathbf{R}\). This question has several interpretations, but one experiment Thurston does is to take 20,000 monic polynomials of degree 21 with integer coefficients in \([-5,5]\), and plots the quantities \(\sigma \alpha /\alpha \in B(1)\) for all the conjugates of the 5,932 resulting Perron polynomials such that the corresponding Perron integer was in the interval \([1,2]\). The result is this:

Perron

The first observation is that this graph has (apart from some noise coming from real roots) rotational symmetry. The next observation is that the roots tend to be concentrated in a ring of some radius, which (from experiment) becomes more concentrated the more one restricts the range in \(\mathbf{R}\) of the Perron integers one is considering. The first question is: can one explain this graph, and does it reflect reality (that is, the actual distribution of Perron integers)?

The answers to these questions turn out to be: yes, and no. The first problem is that it is hard (a priori) to “randomly” generate Perron algebraic integers of large degree in \([1,2]\). Knowing a bound on the roots places a bound on the coefficients, but a randomly chosen polynomial with coefficients satisfying the required bounds will almost always have a root larger than 2. Thus Thurston “cheats” with his algorithm, making the coefficients of his polynomials very small in order to increase the probability that the largest root will also be small. (Full disclosure, Thurston makes no claims that his algorithm reflects reality, and explicitly asks whether it does so or not.) The issue is then whether this will skew the distribution of the roots. It turns out that it does! To explain why this might not be surprising, let’s talk about the size of the spaces over which Thurston is sampling. Let \(\Omega^P_{21}\) be the set of monic polynomials of degree 21 with real coefficients and with a unique largest real root \(\lambda \le 2\). Thurston is sampling over a space with \(11^{20}\) lattice points and volume \(10^{20}\). On the other hand, it turns out that the volume of \(\Omega^{P}_{21}\) is equal to

\(\displaystyle{\frac{2^{399}}{3^{24} 5^{12} 7^{10} 11^{11} 13^{9} 17^{5} 19^{3}}} \sim 2.249 \times 10^{60}.\)

So Thurston was only really sampling a \(10^{-40}\)th of the entire space! Thurston’s picture can be explained as follows: polynomials with (suitably) small coefficients (contingent on the initial and final coefficients not being too small) tend to have all their roots clustering uniformly around the disc of radius one. This follows in the radial direction by a famous theorem of Erdös and Turán, and for the absolute values it follows (in a related way using Jensen’s formula) from a paper of Hughes and Nikeghbali here. So the apparent “radius” in Thurston’s picture is just representing \(1/R\), where \(R\) is the approximate size of the Perron integers being considered. It turns out that, in reality, most of the conjugates of Perron integers have size comparible to the Perron integer itself. That is, the correct version of Thurston’s picture should show the roots clustering (roughly) uniformly around the boundary.

OK, now a pause when I look at Thurston’s graph and see that the radius is not something like a half as I claimed above, but something much smaller. So I just repeated Thurston’s experiment, and out of 20,000 monic polynomials with coefficients randomly chosen in [-5,5], only 1011 were Perron polynomials with largest root less than 2, and the resulting picture came out like this:

New Perron

Here one really sees the (misleading) accumulation around the radius \(1/2\). I’m guessing that Thurston actually kept all polynomials whose largest root was in \([1,5]\), which would account for the larger success rate for choosing Perron polynomials as well as the smaller radius. This is also consistent with how Thurston describes the corresponding graphs in the MO question rather than in Figure 2 of his preprint.

So how does one study Perron integers? Let us re-wind slightly and discuss a more elementary problem. How does one count algebraic integers? The most natural way to count algebraic integers is to order them by height. However, Thurston’s problem clearly suggests a different measure, namely, to count by the size of the largest conjugate. This has a profound effect on some of the statistical properties under consideration. Roughly, algebraic integers ordered by height are much more likely to have a small number of “outliers” with large absolute value, whereas when one orders by the size of the largest conjugate, most of the other conjugates accumulate around the circle with radius the size of the largest root as the degree goes to infinity.

The problem of understanding algebraic integers of bounded size (where by bounded we mean a bound on the largest conjugate) amounts to understanding the lattice points in a certain region of \(\mathbf{R}^N\). Now as long as one fixes the degree and increases the bound, such counting problems (including this one) typically reduce to a volume problem. (One also uses the fact that almost all polynomials are irreducible, and that the regions are “nice” in some explicit way, i.e. not Cantor sets.) Moreover, the corresponding regions are essentially (up to a simple stretching) independent of the bound. Hence the key region to understand is the region \(\Omega_{N} \subset \mathbf{R}^N\) of monic degree \(N\) polynomials all of whose roots have absolute value at most one, and the region \(\Omega^{P}_N \subset \Omega_N\) consisting of such polynomials whose largest root is real. Of course, one is not only interested in the volumes of these regions, but also the integrals of various quantities. As an example, one can consider the integral

\(C_N(T,\alpha) = \displaystyle{\int_{\Omega_N} P(T) |a_N|^{\alpha – 1} dV}\)

where \(P \in \Omega_N\) represents the monic polynomial at any point, and \(a_N\) is the constant term. Evaluating this integral at \(\alpha = 1\) and taking the leading term (in \(T\)) recovers the volume. On the other hand, there are some other relations. A fairly simple computation shows that

\(\mathrm{Vol}(\Omega^P_N) = \displaystyle{\frac{4}{N(N+1)} C_{N-1}(1,1)},\)

which is how one can compute the left hand side exactly for any \(N\). In order to evaluate these integrals, it makes more sense to integrate not over the “coefficient space” of polynomials, but rather the “configuration space” of roots. The coefficient space is naturally stratified by the number of real and complex roots. For that reason, it makes sense to decompose \(\Omega_N\) as

\(\coprod_{R + 2 S = N} \Omega_{R,S}\)

where \(\Omega_{R,S}\) corresponds to polynomials whose roots all have absolute value at most one and have signature \((R,S)\) (since we are interested only in integrals, we elide issues concerning whether one wants these spaces to be open or closed or somewhere in between). As a special case, let’s think about the integral \(C_{N,0}(T,\alpha)\) where one restricts the integrand to \(\Omega_{N,0}\). The configuration space is simply \([-1,1]^N\). On the other hand, the map from configuration space to coefficient space is just given in terms of the symmetric polynomials, and the corresponding Jacobian matrix is the Vandermonde determinant. Hence, taking into account the action of \(S_N\) on the fibres, one finds that

\( C_{N,0}(T,\alpha) = \displaystyle{\frac{1}{N!} \int_{[-1,1]^N} \left| \prod x_i \right|^{\alpha – 1}
\prod (T – x_i) \prod |x_i – x_j| dx_1 \ldots dx_N}.\)

This is now very reminiscent of the classical Selberg Integral. There is some beautiful mathematics related to the Selberg integral; let me direct you here for a nice survey. The integrals arising here are, however, not quite Selberg integrals except for some very degenerate cases.

Once you start writing these integrals down, and computing some of them (by hook or crook), there are a number of problems which naturally come to mind. For example, what is the probability that a random polynomial all of whose roots have absolute value at most one is Perron? Well, by explicitly computing the ratio of the volume of \(\Omega^P_N\) to \(\Omega_N\), you find that the answer is \(1/N\) if \(N\) is odd and \(1/(N-1)\) if \(N\) is even (this checks out for \(N = 1,2\)). On the other hand, you might ask, given a polynomial all of whose roots have absolute value at most one, what the expected number of real roots?, or what the probability is (at least in even degree) that the polynomial has no real roots at all? Having asked these questions, it is then sensible to ask the same questions for other ways of choosing random polynomials. The classical way to choose a real random polynomial is to write

\(f(x) = a_N x^N + \ldots + a_0\)

where the \(a_i\) are independent normal variables with mean zero (this is the Kac ensemble). To what extent do the statistics of random polynomials with this measure mirror the constrained problem consisting of polynomials all of whose roots have absolute value at most one? Obviously, it depends on the type of problem one considers. The most classical problem for real polynomials concerns counting the expected number of real roots. A famous theorem of Kac says that, under the ensemble above, the expected number of real roots is approximately \(2/\pi \cdot \log(N)\). I recommend reading this paper for an introduction to the subject; I learnt these things from chatting with Peter Sarnak at the IAS.) The methods of Kac also show that the real roots concentrate for large N around \(– 1\) and \(+ 1\). In fact, the complex roots also concentrate along the unit circle as well. How does this compare to our constrained model? First of all, the real roots in Kac model either lie in \([-1,1]\) or in \([-\infty,1] \cup [1,\infty]\). Certainly our polynomials have no roots in the larger region. If one restricts the Kac polynomials to \([-1,1]\), then the expected number of real roots decreases to \(1/\pi \cdot \log(N)\). This is in some sense easy to see from the previous formula, because the map on coefficients \(a_k \rightarrow a_{N-k}\) is measure preserving and inverts the roots. In fact, a stronger result follows from Kac. If one takes an inteveral \([a,b]\) strictly contained inside \([-1,1]\), then the expected number of real roots in the polynomial for sufficiently large \(N\) converges to

\( \displaystyle{\frac{1}{\pi} \int^{b}_{a} \frac{1}{1 -T^2}}.\)

This gives another strong indication of how the roots are concentrating at the points +1 and -1. OK, so now let us return to our constrained model consisting of monic polynomials all of whose roots have absolute value at most one. How many real roots does one expect such a polynomial to have? There’s a natural map

\(\Omega_{N-1} \times [-1,1] \rightarrow \Omega_{N}\)

which sends \(P(x)\) to \(P(x)(x-T)\). The Jacobian of this matrix turns out to be equal to \(|P(T)|\). On the other hand, the map is not one to one, rather, the image of \(\Omega_{R,S}\) has multiplicity \(R\). Hence, if \(Z(P)\) denotes the number of real roots of the polynomial \(P\), then

\(\displaystyle{\int_{\Omega_N} Z(P)} = \int_{0}^{1} \int_{\Omega_{N-1}} |P(T)| dV\)

The left hand side (after dividing by the volume) gives the expected number of real roots. So one is again reduced to a Selberg type integral. In this case, one apparently has (based on some Zagier-like integral guessing mojo, but unfortunately not yet Zagier-like integral proving mojo) for \(N = 2m\),

\( \displaystyle{\frac{1}{D_N}
\int_{\Omega_N} |P(T)| =
\frac{1}{2^{2m}
{2m \choose m}}
\left( \sum_{k=0}^{m} \frac{2m-2k+1}{2m+1}
{2m-2k \choose m-k}
{2k \choose k}
T^{2k} \right)
\left( \sum_{k=0}^{m}
{2m-2k \choose m-k}
{2k \choose k}
T^{2k} \right)},\)

and there is a similar formula for \(N = 2m+1\). After some analysis to estimate the resulting integral of the RHS from \(T = -1\) to \(1\), it turns out that, for large \(N\), the expected number of real roots is approximately

\( \displaystyle{\frac{1}{\pi} \log N},\)

whic is exactly in accordance with the Kac model! Indeed, if one restricts to real roots in an interval \([a,b]\) strictly in \([-1,1]\), then one also obtains the same integral formula as in the Kac ensemble. So, somewhat surprisingly to me, the number of real roots in \([-1,1]\) behaves in a very similar way whether one considers Kac polynomials or monic polynomials all of whose roots have absolute value at most one.

What then of the other problems? Given a polynomial in the Kac model of even degree \(2N\), what is the probability that is has no roots in the interval \([-1,1]\)? This problem was explicitly addressed by Dembo, Poonen, Shao, and Zeitouni here, where they show (under less restrictive hypotheses) that this occurs with probability \(O(N^{-b/2 + o(1)})\) for some universal constant \(b/2\) which they do not determine, although they estimate based on numerical evidence that \(b/2 = 0.38 \pm 0.015\). What happens in our constrained model? Once more it comes down to a Selberg-like integral, this time computing the ratio of volumes:

\(\displaystyle{\frac{\displaystyle{\int_{\Omega_{0,N}} dV}}{\displaystyle{\int_{\Omega_{2N}} dV}}}\)

It turns out that one can compute this explicitly as a product of factorials. Moreover, one can compute the exact asymptotic in this case as \(N \rightarrow \infty\), and the resulting probability is

\(\displaystyle{ \frac{2C}{\sqrt{2 \pi} (2N)^{3/8}}, \ \text{where} \ C = 2^{-1/24} e^{-3/2 \cdot \zeta'(-1)} = 1.24512 \ldots }\)

(It may be hard to read in the exponent, but that is the derivative of the Riemann zeta function \(\zeta'(-1)\) at \(-1\). That may seem strange, but in fact this is a fairly typical constant that comes up in asymptotics of the Barnes-G function, which is exactly the type of expression (a product of factorials) which turns up in the evaluation of the relevant integrals.) Now the result of DPSZ does not apply in our case (where the coefficients are a long way from being independent), but given the similarity in the distribution of real roots between our polynomials and the Kac model, we naturally make the following conjecture:

Conjecture: The constant \(b/2\) is equal to \(3/8\).

Optimistically, one might even try to prove this conjecture by showing that the statistics of our collection of polynomials mirror those of the Kac polynomials for sufficiently large \(N\).

Next time: we discuss a more concrete relationship between random polynomials and our models in terms of limits of gap probabilities. But let me also leave you with the following teaser question: What is the probability that the largest root of a polynomial of degree \(N\) is real?

Posted in Mathematics, Students | Tagged , , , , , , , , | 6 Comments

A postview of Bellairs/Barbados

I am just recovering from my trip to Barbados for the McGill sponsored conference at the Bellairs institute (which I previously discussed here). I thought it was a wonderfully enjoyable conference, for many reasons. The first is that I got to give 14 hours or so of talks, and I like the sound of my own voice. What was unique, however, was the really high level of the audience, not just in terms of technical strength, but in terms of their knowledge of the particular topics which were being discussed. Usually when you have a chance to talk to a specialized audience, you only have 50 minutes to speak, and for at least for the first 20 minutes or so you should not assume that your audience is au fait with all the latest technical developments in the subject. On the other hand, the contexts in which one has multiple hours to give details (such as a mini-course or graduate class) it’s often the case that the target audience is graduate students first encountering the material. At this conference, practically half the audience had written papers proving modularity lifting theorems! I surveyed some participants beforehand on how long I should spend reviewing the basic theory of Galois deformations, and the answers typically ranged from 1 to 5 minutes. In reality, I gave a 150 minute “background” talk on the first morning, although by background here I really mean Wiles’ proof of minimal modularity lifting for irreducible modular Galois representations of \(G_{\mathbf{Q}}\).

I broke the mold of previous Bellairs conferences by scheduling an additional talk in the afternoon, so typically we had some 6-7 hours of lectures per day. This sounds a lot, but when it is divided up into only three speakers and spread out from early morning to late evening, it didn’t seem so much at all. (We still had plenty of time every day to snorkel at the reef, and even one free afternoon to go on a boat tour and swim with the turtles. Even Sug Woo’s 200+ minute talk just flew by, although it was accompanied by rum drinks.) In addition to the background talks I mentioned previously, there were also research talks by Peter Scholze, Jack Thorne, George Boxer, Ila Varma, and David Geraghty (I may blog about some of these talks later). I think this was the first conference in which I learned something from every single talk. Of course, I did get to suggest many of the participants, so in a way this conference was designed for me.

Speaking of great theses by Richard Taylor students (George and Ila), it’s kind of amazing what is required/expected of a graduate student in number theory nowadays. It certainly makes me feel positive towards the future of our subject. Speaking of Richard, I heard (although have no confirmation) that he thought the conference sounded interesting, and so it is somewhat embarrassing that I didn’t suggest his name as someone to be invited. On the other hand, it would have been even more embarrassing for him to have actually come and then had to share a room with someone (the accommodations were fairly spartan) while I was in a room by myself. Along those same lines, I’m 100% certain that Mark (“I don’t get out of bed for less than $10,000 a day”) Kisin would not have come. (Full disclosure, Mark claims that Toby and I exaggerate spread false rumours concerning his demands for luxury accommodations at conferences.)

One outcome of the conference is that I feel confident that we will have unconditional modularity lifting theorems for \(\mathrm{GL}(n)/\mathbf{Q}\) in the next five years. Of course, it’s always dangerous to make predictions.

Finally, apropos of nothing, I hope to have more posts in the future whose keywords include both “Richard Taylor” and “Turtles.”

Posted in Travel, Waffle | Tagged , , , , , , , , , | Leave a comment

Are business schools intellectually bankrupt?

From the New York Times today, a report from business school professors concerning a study which claims to show that professors are prejudiced, too. I remember reading the original paper on this study, which made it painfully clear that the authors were pursuing an agenda and that they arrived at their conclusions by scouring their data for correlations which supported their case, a classic hallmark of poor science. But perhaps sound methodology is too much to expect from business school professors?

The reason I paid particular attention to this study was that I was one of the participants. Here was the original email I received.

Dear Professor XXX,

I am writing you because I am a prospective doctoral student with considerable interest in your research. My plan is to apply to doctoral programs this coming fall, and I am eager to learn as much as I can about research opportunities in the meantime.

I will be on campus next Monday, and although I know it is short notice, I was wondering if you might have 10 minutes when you would be willing to meet with me to briefly talk about your work and any possible opportunities for me to get involved in your research. Any time that would be convenient for you would be fine with me, as meeting with you is my first priority during this campus visit.

Thank you in advance for your consideration.

I remember receiving this email. What immediately struck me was the repeated vague references to “my research.” Now in order to have any appreciation of my research, you would, at the very least, have to know that that Langlands program exists, or that my research is related to Wiles’ proof of Fermat’s Last Theorem and algebraic number theory. The fact that there is no mention of number theory nor any indication of the background of the student immediately links the email (in my mind) to academic spam. Surely it’s the case that my reaction would be shared by many academics? Who is so desperate for attention that they would imagine this email reflects a genuine personal interest in their work? As you would expect, I completely ignored the email and promptly forgot about it. Then, a week later, I received the following email:

Dear Professor XXX,

Recently, you received an email from a student asking for 10 minutes of your time to discuss your Ph.D. program (the body of the email appears below). We are emailing you today to debrief you on the actual purpose of that email, as it was part of a research study. We sincerely hope our study did not cause you any disruption and we apologize if you were at all inconvenienced. Our hope is that this letter will provide a sufficient explanation of the purpose and design of our study to alleviate any concerns you may have about your involvement. We want to thank you for your time and for reading further if you are interested in understanding why you received this message. We hope you will see the value of the knowledge we anticipate producing with this large academic study.

We are decision-making researchers interested in how choices differ when they are made for “now” versus for “later”. Previous research has shown that people tend to favor doing things they viscerally want to do over what they believe they should do when making decisions for now, while they are more likely to do what they believe they should when making decisions for later (for a review, see Milkman, Rogers and Bazerman, 2008). The email you received from a student asked for a meeting with you either today (if you were randomly assigned to the “now” condition) or in a week (if you were randomly assigned to the “later” condition). This email was actually from a fictional student. It was designed for a study of the responsiveness of University faculty to meeting requests from prospective students of various backgrounds made on short notice versus well in advance. Faculty members at the top 260 U.S. Universities (as ranked by U.S. News and World Report) and affiliated with Ph.D. programs were identified as potential participants in this study, and a random sample (6,300 faculty in total – one per Ph.D. program) were selected to receive emails. In addition to examining the responsiveness of faculty to meeting requests for “now” versus “later”, we are also interested in how the identity of the applicant affects, or does not affect, response rates, and as such, the name of the student sending a meeting request was varied (by race and by gender). We expected that students from underrepresented groups would receive fewer meeting acceptances than other students, though we have competing hypotheses about whether this would effect would be stronger in the “now” or the “later” condition.

I love the line concerning the fact they the have “competing hypotheses about whether this would effect would be stronger in the “now” or the “later” condition” — see, they prove their case whatever the data!

Given the ridiculousness of the initial email, I was appalled that my response might contribute to some published data implying that professors were dismissive of minorities and/or women. Let me be clear at this point that it may well be the case that academics are more dismissive towards women (I fear that this may indeed be true in our field), but I am convinced that this study would have little of academic value to say on the matter. That said, having received this second email, I did nervously look back to my initial email to see what fake name I had failed to respond to.

The answer: Steven Smith. Yes! I ignored the name that could easily be a white male.

Update: For a different take on this study by Andrew Gelman (who, apparently, is more willing to spend time answering random emails than I am) see here, here, and here.

Posted in Politics, Rant | Tagged , , , | Leave a comment