Advanced search options

Advanced Search Options 🞨

Browse by author name (“Author name starts with…”).

Find ETDs with:

in
/  
in
/  
in
/  
in

Written in Published in Earliest date Latest date

Sorted by

Results per page:

You searched for subject:(Word Vector Models). One record found.

Search Limiters

Last 2 Years | English Only

No search limiters apply to these results.

▼ Search Limiters

1. Lipecki, Johan. The Effect of Data Quantity on Dialog System Input Classification Models.

Degree: Health Informatics and Logistics, 2018, KTH

This paper researches how different amounts of data affect different word vector models for classification of dialog system user input. A hypothesis is tested that there is a data threshold for dense vector models to reach the state-of-the-art performance that have been shown with recent research, and that character-level n-gram word-vector classifiers are especially suited for Swedish classifiers–because of compounding and the character-level n-gram model ability to vectorize out-of-vocabulary words. Also, a second hypothesis is put forward that models trained with single statements are more suitable for chat user input classification than models trained with full conversations. The results are not able to support neither of our hypotheses but show that sparse vector models perform very well on the binary classification tasks used. Further, the results show that 799,544 words of data is insufficient for training dense vector models but that training the models with full conversations is sufficient for single statement classification as the single-statement- trained models do not show any improvement in classifying single statements.

Detta arbete undersöker hur olika datamängder påverkar olika slags ordvektormodeller för klassificering av indata till dialogsystem. Hypotesen att det finns ett tröskelvärde för träningsdatamängden där täta ordvektormodeller när den högsta moderna utvecklingsnivån samt att n-gram-ordvektor-klassificerare med bokstavs-noggrannhet lämpar sig särskilt väl för svenska klassificerare söks bevisas med stöd i att sammansättningar är särskilt produktiva i svenskan och att bokstavs-noggrannhet i modellerna gör att tidigare osedda ord kan klassificeras. Dessutom utvärderas hypotesen att klassificerare som tränas med enkla påståenden är bättre lämpade att klassificera indata i chattkonversationer än klassificerare som tränats med hela chattkonversationer. Resultaten stödjer ingendera hypotes utan visar istället att glesa vektormodeller presterar väldigt väl i de genomförda klassificeringstesterna. Utöver detta visar resultaten att datamängden 799 544 ord inte räcker till för att träna täta ordvektormodeller väl men att konversationer räcker gott och väl för att träna modeller för klassificering av frågor och påståenden i chattkonversationer, detta eftersom de modeller som tränats med användarindata, påstående för påstående, snarare än hela chattkonversationer, inte resulterar i bättre klassificerare för chattpåståenden.

Subjects/Keywords: Chatbot; Chatterbot; Virtual Assistant; Dialog System; Natural Language Understanding; Word Embedding; Word Vector Models; Text Classification; Chattbot; Virtuell Assistent; Dialogsystem; Naturlig språkbehandling; Ordinbäddning; Ordvektormodeller; Textklassificering; Language Technology (Computational Linguistics); Språkteknologi (språkvetenskaplig databehandling)

…the Social Functions of a Dialog System 28 5.2 Word Vector Models and the Sparsity of… …data threshold above which dense word vector models are better than sparse word vector models… …gram vector models on word level and character level (Figure 4.3 and Figure 4.4)… …4.3: F0.5-scores for 10 dense vector models (word level n-gram models and character… …single statements Figure 4.4: F0.5-scores for 10 dense vector models (word level n-gram… 

Record DetailsSimilar RecordsGoogle PlusoneFacebookTwitterCiteULikeMendeleyreddit

APA · Chicago · MLA · Vancouver · CSE | Export to Zotero / EndNote / Reference Manager

APA (6th Edition):

Lipecki, J. (2018). The Effect of Data Quantity on Dialog System Input Classification Models. (Thesis). KTH. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-237282

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Chicago Manual of Style (16th Edition):

Lipecki, Johan. “The Effect of Data Quantity on Dialog System Input Classification Models.” 2018. Thesis, KTH. Accessed August 04, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-237282.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

MLA Handbook (7th Edition):

Lipecki, Johan. “The Effect of Data Quantity on Dialog System Input Classification Models.” 2018. Web. 04 Aug 2020.

Vancouver:

Lipecki J. The Effect of Data Quantity on Dialog System Input Classification Models. [Internet] [Thesis]. KTH; 2018. [cited 2020 Aug 04]. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-237282.

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Council of Science Editors:

Lipecki J. The Effect of Data Quantity on Dialog System Input Classification Models. [Thesis]. KTH; 2018. Available from: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-237282

Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

.