Skip to content

Research at St Andrews

Modelling string structure in vector spaces

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Author(s)

Richard Connor, Al Dearle, Lucia Vadicamo

School/Research organisations

Abstract

Searching for similar strings is an important and frequent database task both in terms of human interactions and in absolute world-wide CPU utilisation. A wealth of metric functions for string comparison exist. However, with respect to the wide range of classification and other techniques known within vector spaces, such metrics allow only a very restricted range of techniques. To counter this restriction, various strategies have been used for mapping string spaces into vector spaces, approximating the string distances within the mapped space and therefore allowing vector space techniques to be used.

In previous work we have developed a novel technique for mapping metric spaces into vector spaces, which can therefore be applied for this purpose. In this paper we evaluate this technique in the context of string spaces, and compare it to other published techniques for mapping strings to vectors. We use a publicly available English lexicon as our experimental data set, and test two different string metrics over it for each vector mapping. We find that our novel technique considerably outperforms previously used technique in preserving the actual distance.
Close

Details

Original languageEnglish
Title of host publicationProceedings of the 27th Italian Symposium on Advanced Database Systems
Subtitle of host publicationCastiglione della Pescaia (Grosseto), Italy, June 16th to 19th, 2019
EditorsMassimo Mecella, Guiseppe Amato, Claudio Gennaro
PublisherSun SITE Central Europe
Number of pages12
Publication statusPublished - 9 Jul 2019
EventSEBD 2019 27th Italian Symposium on Advanced Database Systems - Castiglione della Pescaia, Castiglione della Pescaia, Italy
Duration: 17 Jun 201919 Jun 2019
Conference number: 27
http://sebd2019.isti.cnr.it/

Publication series

NameCEUR Workshop Proceedings
PublisherSun SITE Central Europe
Volume2400
ISSN (Print)1613-0073

Workshop

WorkshopSEBD 2019 27th Italian Symposium on Advanced Database Systems
Abbreviated titleSEBD 2019
CountryItaly
CityCastiglione della Pescaia
Period17/06/1919/06/19
Internet address

    Research areas

  • Metric mapping, n-Simplex projection, Pivoted embedding, String, Jensen-Shannon distance, Levenshtein distance

Discover related content
Find related publications, people, projects and more using interactive charts.

View graph of relations

Related by author

  1. Linking Scottish vital event records using family groups

    Akgün, Ö., Dearle, A., Kirby, G. N. C., Garrett, E., Dalton, T. S., Christen, P., Dibben, C. J. L. & Williamson, L. E. P., 25 Mar 2019, In : Historical Methods: a Journal of Quantitative and Interdisciplinary History. Latest articles, 17 p.

    Research output: Contribution to journalArticle

  2. Understanding the linking possibilities in Scottish Records and an algorithmic approach to full linkage

    Dearle, A., Kirby, G. N. C., Lee, W. & Dibben, C., 20 Jun 2018. 1 p.

    Research output: Contribution to conferencePaper

  3. Unikernel support for the deployment of light-weight, self-contained, and latency avoiding services

    Jaradat, W., Dearle, A. & Lewis, J., 21 Mar 2018. 1 p.

    Research output: Contribution to conferenceAbstract

  4. Querying metric spaces with bit operations

    Connor, R. & Dearle, A., 2018, Similarity Search and Applications: 11th International Conference, SISAP 2018, Lima, Peru, October 7-9, 2018, Proceedings. Marchand-Maillet, S., Silva, Y. N. & Chávez, E. (eds.). Cham: Springer, p. 33-46 14 p. (Lecture Notes in Computer Science; vol. 11223).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

ID: 259580206

Top