Electrical and Information Engineering
The University of Sydney
spcr
spcr

Text Mining System

This has been a core research area at WEG since its inception.

We have developed algorithms and software frameworks for different application domains. Our current interest is to develop text mining functionalities for e-learning applications.

Our first framework was developed by Ken Williams and is widely used in commercial and academic contexts. With Prof. Jae-Moon Lee at Hansung University of Lorea we used to process large corpora of news stories. With Xiaoblo Li to classify web pages of the Open Directory Project. It is based in Perl and you can download it from CPAN. Ken successfully finished his Masters and went back to the USA but still maintains the package.

We are developing a new framework that we call TMS.
The new tool is built in Java and uses a number of open source packages. Jorge Villalon is the PhD student currently developing the tool. The framework currently runs as a standalone desktop application and as a component of the Sakai Learning Management System where it uses the Search package and the Lucene indexer.

If you want to learn more about text mining frameworks:
Garcia Adeva, J. J. and Rafael A. Calvo. Mining Text with Pimiento. IEEE Internet Computing. Vol. 10, No. 4. pp. 27 - 35, July/August 2006.[DOI]
Our text mining research has been funded by a number of organizations and grants:

  • Calvo, R.A. "Efficient Data Manipulation" Australian Research Council - Linakage International $59k (2003)
  • Calvo, R.A. "Document Classification - Travel grant to Korea" (2004) Australian Academy of Science and Korea Science Foundation. $4,000
  • Calvo, RA "Clustering aggregates in text classification tasks" (2004). The University of Sydney $10,000. li> Calvo, R.A. "Document Classification for the financial markets seed grant" (2004). AusCapital Markets CRC. $20,000.
  • Calvo, R.A. "Gainspring: Document Classification for the financial markets" (2004). AusCapital Markets CRC. $19,000.
  • Calvo, R.A. "Incremental Learninig in Text Mining Applications" (2006) University of Sydney, $19,000
  • Reimann, P; Calvo R.A. and Paltridge B.- "Using machine Learning and automated document analysis methods to suppor English composition training" (2006-2008). Australian Research Council - Discovery Project $200,000.

 

spcr
Print Friendly VersionPrinter format
spcr
Email a FriendEmail to a friend
spcr
Large text
spcr
Default text
spcr
textsize
spcr
Glosser