[Humanist] 29.204 parsing bibliographical reference lists

Humanist Discussion Group willard.mccarty at mccarty.org.uk
Mon Aug 10 08:57:34 CEST 2015


                 Humanist Discussion Group, Vol. 29, No. 204.
            Department of Digital Humanities, King's College London
                       www.digitalhumanities.org/humanist
                Submit to: humanist at lists.digitalhumanities.org



        Date: Mon, 10 Aug 2015 05:19:02 +1000
        From: Desmond Schmidt <desmond.allan.schmidt at gmail.com>
        Subject: Re:  29.198 end of digital humanities? parsing bibliographical reference lists?
        In-Reply-To: <20150809063134.E5CDC6921 at digitalhumanities.org>


Hi Amir,

looking at this data there is no structure except for some rudimentary
formatting. No standard library routine is going to be able to parse it
correctly. You'll have to write your own parser, but it will be hard
because it is written to be read by humans. For example, what will you do
with:

Sections translated in Pfad; Nyanaponika
Edited with Ajitamitra's commentary. Sarnath 1991

There seem to be references to other works embedded in them. Maybe a lookup
table would work. But it ain't going to be easy. I'd start with something
that would parse the more complete entries, like

Harunaga Isaacson, "Citations from the Ratnavali and Bodhicittavivarana in
the Abhayapaddhati", SII 21, 1997, 55-58; 22, 1999, 55-58

Write something to split it into a hierarchy of sections and lines, and
then match each line against a particular pattern. If it matches, then add
it to a table of "finished" entries. Then gradually add more patterns until
you've got most of it. Then add the hardest ones by hand.

Desmond Schmidt
University of Queensland

On Sun, Aug 9, 2015 at 4:31 PM, Humanist Discussion Group <
willard.mccarty at mccarty.org.uk> wrote:

>                  Humanist Discussion Group, Vol. 29, No. 198.
>             Department of Digital Humanities, King's College London
>                        www.digitalhumanities.org/humanist
>                 Submit to: humanist at lists.digitalhumanities.org
>
>   [1]   From:    Amir Simantov <wawina at gmail.com>
>   (45)
>         Subject: Parsing Bibliographic Reference Lists
...
>
> --[1]------------------------------------------------------------------------
>         Date: Tue, 28 Jul 2015 07:42:04 -0500
>         From: Amir Simantov <wawina at gmail.com>
>         Subject: Parsing Bibliographic Reference Lists
>
>
> Dear scholars and information technologists,
>
> I am a software developer, and I am currently looking for a tool or library
> to parse bibliographic reference lists for a client of mine.
>
> MY TASK
>
> I need to import data from a website with static HTML pages into Drupal,
> the content management system I most often use. Part of the data are
> references lists. I need to parse each reference into its metadata parts,
> that is, author, book title, journal, pages, etc., according to its type
> (article, book, etc). An example of a page containing reference lists to be
> parsed can be found here
>
> 
>





More information about the Humanist mailing list