Individual linguists with computational interests typically conduct their analysis in a series of discrete steps, using disparate software tools and components, and aggregating the results of several phases of analysis. In this context, a linguist will typically use analytical components with different provenance - both theoretical and technical, but adapt research methodologies to the idiosyncracies of each particular tool. On a larger scale, research groups who are building integrated applications such as voice recognition systems, dialogue modelling systems, machine translation and information retrieval engines require the ability to conduct parameterised analyses on large volumes (often gigabytes) of data for the generation of language models. In this environment, the emphasis shifts to issues of integration and aggregation of these discrete components, and the efficiencies which can be gained through coordinated (but automated) parallel analysis phases. In this talk we will consider the design of automated, multi-phase, large scale linguistic analysis, the corresponding need for resource (data, software, service) discovery, integration and aggregation functions based on formal descriptions, and ultimately, the frontier of distributed computation for natural language processing research.
Baden Hughes is a Senior Research Associate in the Language Technology Research Group within Department of Computer Science and Software Engineering at the University of Melbourne. He has published or presented papers in a variety of language technology related fields including: distributed systems, markup standards, metadata, digital language archives, linguistic annotation, morphological parsing, query languages, text-processing, and web-enablement of linguistic documentation resources. He has tertiary qualifications in linguistics, natural language processing, publishing and computer science in addition to various industry certifications.