[Humanist] 31.153 on maths for humanists

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Sun Jul 2 08:48:13 CEST 2017

                 Humanist Discussion Group, Vol. 31, No. 153.
            Department of Digital Humanities, King's College London
                Submit to: humanist at lists.digitalhumanities.org

        Date: Sat, 1 Jul 2017 15:50:57 -0400
        From: Henry Schaffer <hes at ncsu.edu>
        Subject: Re:  31.152 pubs: on maths for humanists
        In-Reply-To: <20170701075422.A572067D0 at digitalhumanities.org>

  Thanks for a very nice discussion spanning several realms of
investigation. One reason I am certain of your wide experience is your
statement, "The above is a 'frequentist' account, based on probabilities.
The other doctrine is 'bayesian' (who are not to be left alone with
frequentists in the presence of sharp objects)."


P.S. I've found that many frequentists in the life sciences use the Bonferroni
Correction - which somewhat moves out of the orthodox frequentist territory.

On Sat, Jul 1, 2017 at 3:54 AM, Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote:

>                  Humanist Discussion Group, Vol. 31, No. 152.
>             Department of Digital Humanities, King's College London
>                        www.digitalhumanities.org/humanist
>                 Submit to: humanist at lists.digitalhumanities.org
>         Date: Fri, 30 Jun 2017 23:24:59 +0100
>         From: "Norman Gray" <norman at astro.gla.ac.uk>
>         Subject: Re: [Humanist] 31.149 pubs: on maths for humanists;
> digital logic
>         In-Reply-To: <20170630064201.1BE862F59 at digitalhumanities.org>
> Greetings.
> On 30 Jun 2017, at 7:42, Gabriel Egan wrote:
> > In Patrick Juola and Stephen Ramsay's new book _Six Septembers_,
> > announced here, there is a most interesting discussion of
> > the notion of the Null Hypothesis (pp. 246-9).
> I may be able to re-explain this (the following is a slightly protracted
> account, but intended to be complementary to Juola and Ramsay's account
> rather than at all disagreeing with it).
> > The Null Hypothesis is that this 40-pound cat is a Siamese.
> That's right -- the Null Hypothesis is usually the boring hypothesis, or
> the no-new-science-here hypothesis.  You haven't discovered a new breed
> of cat, with Siamese-like markings, just a reeeally fat Siamese.
> But 40 lb is surprisingly heavy for a Siamese -- really very surprising.
>   But how surprising, numerically?
> The argument on Juola and Ramsay's p248 gives a necessarily rather
> hand-waving estimate that the probability of a Siamese cat being this
> heavy is about 1%.  But this cat (as Gabriel points out) is certainly 40
> lb.  So we have a right to be astonished -- this is a very unlikely
> thing (chance of 1%) to come across.
> So at this point we can either (a) decide that today is a weird day, and
> that being accosted by enormous felidae probably won't be the end of it,
> or (b) decide that we don't believe in coincidences, and that something
> is wrong.  Since we do believe (100%) that the cat is that heavy,
> perhaps it's our hypothesis that this is a Siamese that is wrong, so we
> decide to reject that Null Hypothesis.
> > << At this point, the test becomes simple logic. If the cat were
> > an ordinary Siamese, it would probably not weigh forty pounds.
> > Therefore, if it does weigh forty pounds, it's probably not an
> > ordinary Siamese. >>
> >
> > This statement seems to me to commit a well-known fallacy. The
> > probability value is a remark on how often the observed data
> > should be expected if the Null Hypothesis is true, not a remark
> > on the truthfulness of the Null Hypothesis.
> That's exactly right (except that it's not a fallacy): this figure of 1%
> is just a remark on the unlikeliness of what we've seen, given the Null
> Hypothesis.  It's our choice to take the next step and decide to take a
> closer look at that suddenly-suspicious hypothesis.  The 1% (or
> probability of 0.01, written as p=0.01) is the justification we can
> claim for that decision.
> A p-value of p=0.10 (or 10%) is pretty marginal, p=0.05 is publishable,
> p=0.01 is pretty good, as these things go, at least in the social and
> life sciences -- that is, no-one would reproach you for concluding, at
> least provisionally, that this is not a Siamese cat, first appearances
> notwithstanding.  Particle physicists (when discovering Higgs particles)
> like '5-sigma', or about 0.00006%, as a criterion.
> One could write a book about the interpretive logic here (and folk have)
> -- this is by no means terminological quibbling -- but I think a key
> point is that the conclusions in statistical logic are not as obligatory
> as in the deductive logic earlier in the book.  The step from 'p=0.01'
> to 'that is not a Siamese' is an inductive leap that we decide to make,
> with a warrant based on the statistical analysis.  I think that Juola
> and Ramsay's account in their Sect. 4.3.1 makes this sound more
> obligatory than it should be, but in contrast their Sect 4.3.2 is really
> saying that the decision is part of a larger very contingent discussion.
> The above is a 'frequentist' account, based on probabilities.  The other
> doctrine is 'bayesian' (who are not to be left alone with frequentists
> in the presence of sharp objects).  In the bayesian interpretation, we
> start off with some numerical degree of  'a priori' belief that the cat
> is a Siamese cat, and the discovery that it weighs 40 lb, combined with
> our knowledge of the distribution of cats' weights, allows us (using
> Bayes Theorem) to update our belief that this is a Siamese, specifically
> ending up with a rather _smaller_ 'a posteriori' belief that it is a
> Siamese.  The maths is much the same, but the rationale for our change
> of mind is substantially different.
> > I have a personal interest
> > in this that explains why I turned straight to their account
> > of the Null Hypothesis, since such logic has recently been
> > used to much rhetorical effect in my own specialized area,
> > which is authorship attribution by internal evidence. It
> > matters to me whether I'm understanding this topic
> > properly or not, and I'm genuinely asking members of this
> > list to correct me if I'm mistaken.
> I suspect the underlying argument (and I'm recapitulating a logic I'm
> sure you already understand) would go something like this:
>    1. you calculate some statistic or other from a given text -- say,
> the average word length (though obviously much more sophisticated
> statistics would be more helpful);
>    2. by analysing texts known to be by a particular author, Fred, you
> can determine the properties (for example mean and variance) of that
> statistic for Fred's texts;
>    3. for a new text X, you calculate the value of the statistic for the
> text X, and then adopting the null hypothesis that 'X is by Fred', you
> ask how unlikely this value is -- how surprised you are that Fred should
> write such a text -- given the known mean and variance obtained in (2).
> Given that unlikelihood, you can then have a discussion about how
> defensible it is to ascribe the text X to Fred.  The statistics feed
> into the rhetoric of this discussion; they don't supplant it.
> In the real case, I imagine one calculates multiple statistics for
> Fred's texts, calculates the same for broadly comparable texts by all
> authors, and then combines these various distributions together in a
> statistically sophisticated way.  The maths at this point becomes fairly
> hellish, but it remains a more sophisticated version of the basically
> straightforward argument above.  I see that Juola and Ramsay touch on
> this sort of argument in their Sect 4.4.2.
> I hope this shines a torch into the gloom.
> ----
> Just in passing: Juola and Ramsay have written an _ambitious_ book!
> They say near the beginning of Chap. 6 'this is a challenging chapter'.
> Well, it looks to me as if Chap 1--5 are pretty challenging, too.
> Enjoy,
> Norman
> --
> Norman Gray  :  https://nxg.me.uk
> SUPA School of Physics and Astronomy, University of Glasgow, UK

More information about the Humanist mailing list