StudyBuddyMatch

Contents

  1. Introduction and Background
  2. Specifics of StudyBuddyMatch
  3. GUI
  4. StudyBuddyMatch Database Structure
  5. The Matching Pipeline
  6. Improving SBM’s Models with Survey Data
  7. Natural Language Processing (NLP) in StudyBuddyMatch

References

Introduction and Background

It is common to come across challenges during remote learning. Studying online during the Covid pandemic could be a frustrating and isolating experience sometimes, and this particularly held true for college students who started their studies in 2020 and 2021. Collaboration with others who share the same interests and goals often creates a better learning experience. According to Chirac (2014) [14], learning, study-social function, and organization are three important prerequisites for group work to serve as an effective pedagogy and an incentive for learning. Study-social function is clearly missing in the time of Covid, especially during the lockdowns when universities and libraries were forced to close and all lectures were being held online.

StudyBuddyMatch was developed as a part of the SmartUni app with the goal of helping students overcome key challenges of remote learning. The main objective of this application is to enable a search for suitable study buddies who can support each other to foster their academic and personal development.

What difficulties do students experience when studying remotely? - User Story

In the conceptual phase of the project, we had several brainstorming sessions about user stories. Here are a few examples:

  • As a generic student, I want to have a possibility to connect with other people (e.g. group members or tutors) so that help can be provided or group work organized.

  • As a generic student, I want to have an option where I can ask other people who work on the same task if they want to work in parallel, so that I don’t have to work alone and can be more motivated and avoid procrastination.

  • As a generic student, I want to have someone with whom I can discuss the concepts I learned during the lecture and work together on quizzes and assignments.

Based on this brainstorming, we identified the following pain points affecting remote students:

  • Lack of personal connection
  • Feeling of loneliness and isolation
  • Low motivation
  • Difficulties with overcoming distractions and time management
  • Lack of communication and interaction with classmates

The key problem we wanted to address in remote learning scenarios is that students have less opportunity to socialize, and that it is therefore hard to develop bonds and find people to study with.

What are the existing projects so far for study buddy support? - Market Research

Apart from the university mentoring programs, a lot of effort has also been made to help students succeed by providing a study buddy service that connects students with a virtual or human study buddy to assist them in their learning.

The Study Buddy app was a mobile app developed by Nathaniel Blumer in 2016 with a focus on helping local students with their homework. One major functionality of the app is “Ask Tutor”, which, as the name suggests, offers users the possibility to ask questions and receive explanations and solutions provided by tutors, either by taking a picture of the homework problem or formulating a text (for more information see Study Buddy Mobile). In the 2.0 version release, they added course recommendations for users based on the courses that people nearby are studying. However, it seems that the app didn’t get enough attention and was shut down a few years ago.

A similar but more general study buddy app idea was proposed by the UX/UI designer Jarin Tchicaya in 2021. It is a design prototype of a mobile app that facilitates communication between students attending the same university through community chat rooms, private messaging, and document sharing (see Study Buddy App). In their survey, they found that 43.75% of students find it very beneficial to study with others, and 56.25% are even willing to study with someone they don’t know. They have designed the wireframes and built a high-fidelity prototype. However, we don’t know whether it has been put into implementation, i.e. whether there is/will be an end product.

MoocLab also offers a study buddy functionality. It is mainly designed for people who take Massive Open Online Courses (MOOCs). To be able to use it, one needs to have a MoocLab.club account. It has two main functional buttons under the Study Buddy tab: 1) “Find a Study Buddy”, which allows the user to submit their personal information such as location, time zone and gender, and to specify preferred language and the subject they wish to study with their study buddy; and 2) “Browse all Study Buddies”, which shows a list of all buddies available along with their respective profiles. In addition, users can join study groups and write posts, forming an interconnected MOOC study community.

The StuBu (StudyBuddy) project, a joint research project with the participation of different institutions including TU Braunschweig and Jacobs University Bremen, utilizes AI to create a virtual companion for digital autonomous learning. The main goals of the project are to increase learning success and support learners’ professional development. StuBu is designed to identify users’ needs and provide learning support that adapts to their habits and circumstances while interacting with them.

How can the online study experience be improved? - Our Solution Proposal

Developing a virtual AI buddy with fully automated solution is not possible within the scope of a one-year study project considering the limited time and resources involved. After identifying the pain points and evaluating the possibilities, we therefore decided instead to build an application tailored for individual needs. For us, a study buddy is someone you study alongside with while offering mutual support. With that in mind, and by incorporating AI components, the app allows students who share similar interests and goals to be matched together so that they can exchange experiences, communicate with each other, and help each other grow.

User-Flow

Based on specific individual needs and preferences, StudyBuddyMatch aims to find suitable study partners for our users. To get a better understanding of the users’ individual needs and preferences, we designed some questionnaires for users to fill out before they start the matching (for more about the overall design of the questionnaires, see the Questionnaires section). The questionnaire answers play an important role in the matching procedure and affect the recommendations one user can get and their respective matching score.

Below is a visual illustration of the user flow:

Study Buddy Match User Flow

Specific Design of StudyBuddyMatch

In the following, we will present the 7 different StudyBuddyMatch pages: the Welcome Page, the 5 Questionnaires and the Matching Page.

Welcome Page

The Welcome Page introduces the users to the aim and procedures of the StudyBuddyMatch application. Further, we included a link to the user’s profile settings in the Core framework that are needed for the final match recommendation to other users.

Welcome Page

Questionnaires

After thoroughly reviewing user stories and previous literature with similar ideas, we wanted to choose only five questionnaires for our application. This was decided, on the one hand, for the sake of the user experience, since we didn’t want the user to spend hours filling out questionnaires before getting study buddy matches. And on the other hand, as developers in a year-long project, we wanted to be realistic in terms of how many questionnaires we would be able to implement and then use in our matching algorithm.

To decide which five questionnaires should be included in the final version of the application, we made a poll in which each of the 8 team members voted for three questionnaires maximum. The table below summarizes the overall questionnaire options in the poll (divided by category), as well as the number of votes each one got.

Category Questionnaire Number of Votes
Professional Profile Course History and Grades 6
Previous Degrees (Education Level) 2
Massive Open Online Courses (MOOCs) 1
Skill Set Evaluation 3
Academic Interests 3
Psychological Profile Work Style 0
Learning Style 4
Personality Traits 1
Personal Interests and Hobbies 2
Willingness to Communicate 4
Data from SmartPlanner Weekly Schedule 0
Behavioral Profile Elo Score 0
Commitment 1
Usage Tracking 0
High-Level Profile Demographic Data 2
Partner Preferences 5

The following sections explain the idea behind the final five questionnaires implemented in our application, and the last section briefly explains the questionnaires that got at least one vote in the table above.

Personal Academic Interests (PAI)

On this first questionnaire page, the users have to enter their academic interests. They may enter a single interest or multiple ones, separated by commas. The users are required to input words in English so that the input is consistently formatted across users, which is important for the NLP matching algorithm.

Personal Academic Interests Questionnaire

From the user’s perspective, this is a simple questionnaire which should not take more than one or two minutes to submit. However, from our perspective as developers, a lot of effort had to be put into developing this questionnaire in terms of its backend. This is because the users’ answers are free text, which required us to dedicate time to develop a natural language processing (NLP) pipeline. To read more about how we developed this, please refer to the ‘Natural Language Processing (NLP) in StudyBuddyMatch’ section.

Willingness to Communicate (WTC)

The WTC questionnaire is designed to measure the speaker’s intention to initiate communication given free choice in certain situations [7][8]. The questionnaire is comprised of 20 questions, each one a different situation. The user chooses for each situation how likely they are to communicate on a scale from 0 (never) to 100 (always). The full questionnaire with the 20 situations can be found online here.

Upon finishing the questionnaire, the user receives four different scores (each one denoting how likely they are to speak up):

  1. Group Discussions
  2. Meetings
  3. Interpersonal
  4. Public Speaking

The questionnaire should evaluate if the user tends to initiate conversations or if the person is rather shy and prefers it if others start a conversation.

Taking into account this personal trait of the user could help in finding a suitable match for this person. The questions of the questionnaire are realized as sliders that have a range from 0 to 100. In this case 0 means this person would not initiate a conversation in the given situation and 100 means that the user would initiate a conversation for sure.

This rather large range and the situations were taken from the original WTC questionnaire from the paper of McCroskey et. al.(1990). When the user answers all questions, the answers are saved in the database and four scores are calculated: the mean of all questions, the mean of all questions concerning friends involved in the scenarios, the mean of all questions concerning strangers, and the last one was the mean of all questions concerning scenarios in which acquaintances play an important role.

These scores are saved as Questionnaire Result objects which are linked to the user that provided these answers. These scores are used later in the matching process when the similarity scores are computed between the scores of the user and their potential match.

GUI-wise, the questionnaire is built as follows: On the top of the questionnaire’s page, the title is displayed. Below the title is a short instruction which explains that the user should imagine being in the described situations. From there, the user should try to imagine how likely they would be to start a conversation in each situation.

Below the instructions, the questions are displayed as follows: On the left hand, the scenario itself is displayed. On the right hand, a slider is displayed with 0 on the left side of the slider and 100 on the right side. The values are displayed to clarify the possible value range of the sliders for the user. Next to the slider, the current value of the slider is displayed and is updated every time the slider is moved by the user. This was realized via the Java Script Event Listener which triggers a function every time the slider is moved. The displaying of the exact value of the slider gives the user the feedback that the movement of the slider has been successful, and at which value the slider currently is.

Willingness to Communicate Questionnaire

Learning Styles (LS)

In this questionnaire, the user has to choose at least one of the following styles that describes them best [4]:

  • Visual learner: learns by seeing and observing things, including pictures, diagrams, written directions and more.
  • Auditory learner: learns by reinforcing the subject matter by sound; would rather listen to a lecture than read written notes, and use their own voice to reinforce new concepts and ideas.
  • Reading/writing learner: learns through written words; while there is some overlap with visual learning, this type of learner is drawn to expression through writing, reading articles or books, writing in diaries, looking up words in the dictionary, and searching the internet for just about everything.
  • Kinesthetic learner: learns through experiencing or doing things, likes to get involved by acting out events or using hands to touch and handle things in order to understand concepts. This type of learner might struggle to sit still and often excels at sports or likes to dance. They may need to take more frequent breaks when studying.

The users have the option to get more information on the different styles via a collapsible section should they need further guidance.

Our intuitive idea was that study buddies should have at least one learning style in common for a study relationship to be successful. Since one can have more than strictly one learning style, we opted for a multiple-choice questionnaire in this case, where each user can select one or more learning style(s).

Learning Styles Questionnaire
Learning Styles Questionnaire with Collapsible Additional Information

Courses and Grades (CG)

On the Courses and Grades questionnaire page, the users enter their completed courses, and optionally the grades they earned in those courses, and save the displayed course history.

Courses and Grades Questionnaire

In order to make the input of long course titles into the text field easier, and to guarantee that the users enter the exact titles (important for the study buddy matching), we used an autocompletion field. For its database, the courses were extracted from the university’s course management platform StudIP and provided to us in a CSV table containing the titles of the courses and their related institutes. This data table included all courses of the 72 institutes from the selected semesters WS21/22 and SS22 (during the development of SmartUni) as well as the previous SS21.

The autocompletion works as follows: when the user starts typing a single letter into the text field, a JavaScript function will be triggered which searches the stored course data for entries matching that input. As soon as the user modifies the input by adding or deleting a letter, the search will be stopped and adapted to the new input string. The program’s predictions are constantly displayed in a scrollable list below the text field. Once the intended course entry is found, the user can click on it, leading to its automatic display in the text field. The grade can be entered into the numerical field below the title field. When the Add button is clicked, the selected course and its grade (if provided) are entered into a collapsible list that can be accessed with the My courses button. The list contains all added courses until they are separately deleted again.

With these entries, courses and users are stored and linked in our database’s CourseEnrollment instance and filtered in the matching phase, depending on each user’s Partner Preferences selections (favoring or not favoring similarity and specific course and institute match).

Course Title Autocompletion Field on the Courses and Grades Questionnaire

The CG history data source was the one that received the most votes in our poll. Most of us thought that it is essential for the user to submit courses they took part in during the last three semesters. However, providing a grade remains optional, as this can potentially invade the user’s privacy.

The main reason for including this among the data we need for each user was the following scenario: if a user needs help in a specific course (which can be specified in the partner preferences questionnaire), we wanted to look for a study buddy that has already taken this course.

Additionally, submitting courses that the user finds important in their academic career, or that they have especially liked, can give a lot of information about the user’s academic orientation and interests, which can in turn help us match them to a similar user in that regard. Thus, we took the user’s course history into account in the matching process.

Partner Preferences (PP)

This questionnaire was mainly added to apply some filtering when matching users together. The first question asks the user to specify their partner preferences and has the following options:

  • A buddy who has the same academic interests as me
  • A buddy who tends to initiate the communication
  • A buddy whose learning style is the same as mine
  • A buddy whose course history is similar to mine
  • I don’t have any preferences

The user’s answer to the first question gives us information about the weight of each questionnaire when matching.

The second question asks the user to type in the institute their study buddy should come from, and the third asks them to type in a course the study buddy should have already taken. Both are optional, but further help us filter out study buddy options when matching (if they are answered).

To facilitate the user’s selection of an institute or program their study buddy should come from, we implemented two autocompletion fields that work in the same way as that used for the Courses and Grades questionnaire.

These data weight the final matching calculation in the inclusion of the data of the previous questionnaires.

Partner Preferences Questionnaire
Partner Preferences Questionnaire with Two Autocompletion Fields, here Displayed for the Institute Filter only

Other Questionnaires

The following list describes the questionnaires that were not chosen to be implemented for lack of votes:

  • Previous degrees: the history of the user’s past degrees, in case they have any.
  • Massive open online courses (MOOCs): the history of MOOCs the user has taken. These are courses from massive online platforms that offer a variety of skills to learn (e.g., online courses from Coursera and Udemy).
  • Skill set evaluation: a test that evaluates the user’s skill and knowledge in a certain domain. Although this did receive three votes, we decided not to include it since it would be too specific, and because we would have had to choose one domain to focus on and develop a test for. In other words, if we develop a skill set evaluation for, e.g., Computer Science (CS), the only users who can use our application are CS students. We did not want to restrict our user base to one domain, nor did we have time to develop skill test evaluations for several domains.
  • Personality traits: a test that measures a user’s personality traits (e.g., the Big Five Personality Test).
  • Personal interests and hobbies: similar to PAI, though the idea here was for the user to enter their personal hobbies and interests, rather than their academic ones.
  • Commitment: a test to measure the commitment level of each user to their studies.
  • Demographic data: information such as age, gender, etc. We did not implement this since it could potentially lead to many different biases.

GUI

GUI Enhancements

Our questionnaires’ content takes up most of the screen width. Thus, as the screen shrinks (e.g. on tablets and phones), it becomes more challenging to display, in a readable way, both their content and the StudyBuddyMatch right sidebar. The usual procedure to tackle this problem is to collapse the navigation items on the sidebar using a hamburger button. However, we already had a hamburger button for the core app’s left sidebar. Hence, we implemented a function such that the right sidebar automatically goes to the bottom of the page when the max screen width is 768px. Users can either hide or show the navigation bar by clicking on the arrow pointing down or up (see images below).

Click to show bar Click to hide bar

The StudyBuddyMatch navigation bar is vital for users to browse the different questionnaires and pages (e.g., matching page); thus, we wanted them to be able to access it quickly and easily at any moment. That is why we introduced a swipe up and down feature in our application that allows users to use their smartphones’ touch screens to hide or show the StudyBuddyMatch navigation bar.

Swipe functionality

Matching Page

The matching page was designed to replace the start page after the user filled out all questionnaires and thus was eligible for matching. The matching page has two major design elements. The first is the header with the title, explanatory text, dropdown menu to select the matching status, and a button to start the asynchronous matching. The second element is the space to display the actual matches, which take the shape of cards. On the top of each card, the profile picture of the matched users is displayed. This allows the user to decide on first impression whether they want to keep this matched user or not. If a matched user did not provide a profile picture for their account, a placeholder image is displayed instead. The username, institute, and e-mail of the matched user are also provided on the card, as well as two buttons. One of the buttons contains a mailto link and the other one leads to the profile page of the matched user. Directly under the name of the matched user, the similarity score is displayed, giving the matching user some information about how well they might be able to work with the matched user. The matching user has the possibility to pin a match to save it when the matching process is started again and to rate the current matches to give feedback on the quality of the matching process.

No Matches Found Page

Sometimes it is possible that the user’s criteria yields no match. The main reason for this are the two hard filters (cf. Matching Pipeline section), institute and course, that neglect all possible matches that do not fulfill both criteria of either having already taken one specific course or coming from a certain institute. Even though a user does not fulfill both criteria, this person could still be a good Study Buddy for the user. In this case, the user is redirected to the ‘no matches found’ page, which explains to the user why no matches could be found. This is mostly because of the too-specific criteria. The page suggests to the user to either remove one or both of the hard filters for getting any recommendations, and provides them with the option to do so. This functionality was realized by two simple checkboxes and a submit button beneath them. Clicking the submit button re-triggers the whole matching process but now disregards one or both of the hard filters. This, of course, could decrease the quality of the new matches as not all of the criteria will be fulfilled by the matches, but it increases the probability of finding a Study Buddy. If only one hard filter was ignored and the user still cannot get any matches, this page is displayed again. If both hard filters are disregarded, finding a match is guaranteed unless no other users exist in the database.

Feedback for Matching

To train an AI model, labeled data is needed, which we can get by requesting feedback from the user for a given match. To get feedback, we added five radio buttons on each card in the matching page, which are labeled from 1 to 5, one being a very bad match and 5 a very good one. The request “Please rate this match” is displayed above these buttons with the clarification of what they mean. One button per matching card can be chosen at a time to prevent invalid inputs from the user, who could potentially click on two different buttons at a time. Also, we added a statement below the five buttons that the feedback is anonymous (see image below). As soon as one button is clicked, an Ajax call is sent out to the corresponding Python function, which makes sure that the feedback is saved in the database. This makes it easier for the user to give feedback because no submit button has to be clicked on afterwards as the feedback is saved automatically and immediately. The labeled data is then fed into the AI algorithms which then update their learned weights for better recommendations in the future.

Click to show bar

Database Structure

The database structure of the StudyBuddyMatch module is mainly designed for storing the answers to the questionnaires, course enrollments, and for caching intermediate and final similarity calculations. Therefore the SBM part of the database can be subdivided into three sections that will be discussed below. These were based on the terminology of Django’s object-relational mapping system (ORM). Django’s ORM creates database tables and corresponding entries based on model classes that can be defined in a Django project using common object-oriented programming. Simplified, you could say that creating a model class leads to the creation of a table in the database. Relations between models are also realized with database tables, but programmers don’t need to take care of these extra tables when developing the database scheme with the ORM.

StudyBuddyMatch Database Structure (Output of "graph_models" Django extension)

Questionnaires

The user interacts with five questionnaire-like pages where they can fill in data that will be used for calculating their match recommendations. That data needs to be stored in the database so four of those five data sources were realized with the Questionnaire model class.

Questionnaires Database Structure

The Questionnaire model is connected to a set of Question objects that represent the single questions asked in the questionnaires. Question is abstract and has three subclasses that define the type of question. There is the TextQuestion subclass that represents a question with a single text input field for the answer. There are also two choice-based question types which are represented by the model classes SingleChoiceQuestion and MultipleChoiceQuestion. As the names indicate, SingleChoiceQuestion represents a question with a set of answer options from which the user has to choose one. A MultipleChoiceQuestion allows choosing more than one answer, but at least one is required. The answer options in both classes are stored in a JSONField as key-value-pairs. The keys are integers and the values are the textual answer options as strings.

Questions can be answered. Therefore, we created the QuestionAnswer abstract model class and corresponding subclasses analogous to Question and its subclasses. TextQuestionAnswer stores a textual answer in a CharField. SingleChoiceQuestionAnswer stores the integer key of the selected answer option. MultipleChoiceQuestionAnswer, however, works differently. For each selected answer option an instance of AnswerItem is created that holds the key of the selected answer option. If an AnswerItem instance exists for an answer option, the answer has been selected by the user; otherwise it has not been selected. QuestionAnswers have to be connected to a user. This is done by the model class UserQuestionnaire that represents the action of a user taking a questionnaire. It just connects a Questionnaire instance with an instance of SmartUser. Each QuestionAnswer instance holds a foreign key to a UserQuestionnaire instance. Additionally, we have the class QuestionnaireResult that saves a numerical calculation result based on questionnaire answers. This is used in the Willingness To Communicate questionnaire for storing the result of the questionnaire evaluation.

The database representation of our questionnaires was initially designed to be as generic as possible. Nevertheless, it turned out that this generic implementation was a little over-ambitious and led to misunderstandings in the team. Each question type was ultimately only used for one single questionnaire.

Additionally, the partner preferences questionnaire got its own specific database representation for its answers. The PartnerPreferencesAnswers model class combines three different answers in one class. It stores the title of the course the user wants their buddy candidates to have already taken, the name of the institute the candidates should have taken courses from, and the boolean answers for the question about which data sources are especially important to the user when calculating the matching scores.

Courses

The fifth data source we collect is the course history of the user. Users can provide information about what courses they have taken and what grade they have received.

Courses Database Structure

The Course model represents a course a user can be enrolled in.

Courses are provided by institutes of the university. An institute is represented by the Institute class. Using that information, courses can be grouped more or less by topic, e.g. the computer science institute offers mostly computer science courses.

Whenever a user enters a course on the Courses and Grades page, an instance of CourseEnrollment is created, similar to how the information of a user having taken a course is stored. This model also stores the grade the user has received in that course if they decide to provide that information.

Matching

Our matching procedure includes multiple, sometimes quite large calculations. For the whole procedure, we need to store various calculated pieces of information for later use. Therefore, we created additional model classes.

Matching Helper Models

StudyBuddy represents a final match recommendation. It holds a foreign key of the recommended user, the key of the user getting the recommendation, and the unidirectional final matching score associated with that recommendation.

FeedbackSBM represents the numerical feedback a user gives for their match recommendation. This information will later be used for optimizing our machine learning models that are used for the similarity calculations.

The NLPObject model stores the up-to-date tokens of our language-model including the latest improvements based on new data. These tokens are used for calculating text similarities between the individual answers to the Personal Academic Interests questionnaire.

The UserMatchingStatus is the status a user currently has in the matching procedure. It is mostly used for determining if a user is ready to be matched.

MatchingSample represents an anonymous sample for our machine learning models. It holds the similarity scores of the four characteristic dimensions: “Willingness To Communicate,” “Learning Styles,” “Institutes” (calculated based on course history), and “Personal Academic Interests.” Each of these similarity scores is stored non-anonymously with a subclass of BidirectionalSimilarity. This class also includes for which two users the score was calculated, so that these scores are only re-calculated when the corresponding data has changed. MatchingSample stores the answers to the Partner Preferences question, which weights certain similarity dimensions more. Based on that, a buddy recommendation can be calculated that then can be graded by the user. This grade is the final field of MatchingSample. So in the end, the database model holds an eight-dimensional sample vector and a user-assigned label. Both together are used for improving our machine learning models.

The Matching Pipeline

To recommend study buddies to a user, we calculate unidirectional similarity scores between the user requesting the recommendations and all the other users in the matching pool not excluded by the hard filters of the requesting user. We then show the requesting user the top 5 most similar other users. To do so, we choose amongst a set of matching models that take as input similarity values between the requesting user and the other users as well as the partner preferences of the requesting user.

User Matching Status

A very important model class for the matching process is the User Matching Status. It is an important tool for toggling the visibility of the user in the study buddy recommendation pool and for determining if the user should be able to enter the matching page, as it does not make sense to match a user that has not finished all questionnaires already and has therefore not provided all relevant information needed to find them a suitable study buddy. There are four different User Matching Status options: ‘active,’ ‘passive,’ ‘inactive,’ and ‘unready.’ The User Matching Status of the user determines whether the person is able to enter the matching process, whether they get recommendations for potential study buddies, and whether they get recommended as a study buddy themselves. One can divide the User Matching Status into two categories. The first one includes ‘active,’ ‘passive,’ and ‘inactive,’ and the second one includes only ‘unready.’ In the first category, the user is able to enter the matching phase because all questionnaires are filled out and the user is ready for matching. The user can see the matching page icon in the sidebar and the next button on the last questionnaire also leads immediately to the matching page. As soon as the user deletes one of the answers of a questionnaire, the User Matching Status will be set to ‘unready’ again because then the user should not be able to get matches. Furthermore, a user with the User Matching Status ‘unready’ should not be able to see the matching page icon in the sidebar and if the user clicks on the next button, the person gets a reminder that not all questionnaires have been filled out so far. If the user then submits answers for all questionnaires, the person will get a congratulations message telling the user that they have the possibility of getting matches now. The user is then directed to the matching page immediately. The difference between ‘active,’ ‘passive’, and ‘inactive’ is the visibility of the user in the pool of potential matches and the possibility of receiving matches. When the User Matching Status of a user is set to ‘inactive’, the user will not be recommended to others, and the user also does not receive recommendations. When the User Matching Status is set to ‘passive’, the user can be recommended to others, but the user will not get any recommendations for a potential Study Buddy. The User Matching Status ‘active’ allows the user to both, to get recommendations, and to be recommended to others. As soon as the user clicks on the matching button on the matching page for starting the matching, the User Matching Status of this user will be set to ‘active’ automatically. The user also has the possibility to change the User Matching Status manually by choosing ‘inactive’ or ‘passive’ in the dropdown menu on the matching page. In this dropdown menu, the different User Matching Status options are explained again and a success message is displayed after the User Matching Status has been changed. With this the user can see that the change has been successful.

Hard Filters

Calculating similarity measures between two users is computationally costly. Therefore, it saves a lot of runtime to not do the calculations for all possible study buddies in the database during each evaluation round, which would actually be for every user whose UserMatchingStatus is set to passive or active in our database at that time.

In order to speed up the calculations, we introduced two hard filters. The first hard filter deals with the course the study buddy should have already taken and the second one deals with the institute the study buddy should come from.

Which course and institute the user searching for a study buddy requests is stored in the user’s Partner Preference Answer.

Whether these two criteria are fulfilled can be seen in the CourseEnrollments of the study buddy. In order to check if the course requirement is met, one can filter for the course in the course history of the study buddy. If the course history list contains the required course, the study buddy remains in the candidate pool. As a consequence, the potential study buddy will be considered further in the matching process.

If the course history does not contain the specific course, the study buddy will be excluded from the candidate pool and will not be recommended to the user. If the user removes the hard filter later in the matching process, the study buddy can be re-added to the pool.

Coming to the institute filter, it is first important to explain when a user is considered as being part of an institute. A user, regardless of the User Matching Status, is regarded as coming from an institute when they have completed more than 10% of all available courses in this specific institute. The amount of courses the user completed already at that institute is thus divided by the amount of all courses from that institute. If the percentage of courses taken by the user in the requested institute is higher than 10 then the user fulfills the requirement of this hard filter and will stay in the pool of potential study buddies. If the percentage is smaller, the study buddy will no longer be considered as a potential match for the user.

We did not ask the user directly for the institute they come from because the user might be enrolled in institute A but has already completed many courses in institute B. If the user completed many courses in institute B, they can still be a good study buddy because they have a good knowledge of courses in this institute. Therefore, this study buddy could still help a user who is searching for someone coming from this institute.

Input to the Matching Model

We use the three institutes a user has taken the most courses from and the data from the questionnaire answers (indirectly) as input to our matching model. In the following, these inputs are described for an example user a:

  • The “institute” vector that lists the top three institutes the user has taken courses from \(inst_a = ()\) or \(inst_a = \begin{pmatrix} {inst_1}_a \end{pmatrix}\) or \(inst_a = \begin{pmatrix} {inst_1}_a \\ {inst_2}_a \end{pmatrix}\) or \(inst_a = \begin{pmatrix} {inst_1}_a \\ {inst_2}_a \\ {inst_3}_a \end{pmatrix}\) (the length of the institute vector ranges from 0 to 3, depending on how many institutes the user has taken courses from).

  • The “willingness to communicate” vector \(wtc_a = \begin{pmatrix} {wtc_{group\_discussion}}_a \\ {wtc_{meetings}}_a \\ {wtc_{interpersonal}}_a \\ {wtc_{public\_speaking}}_a \\ {wtc_{stranger}}_a \\ {wtc_{acquaintance}}_a \\ {wtc_{friend}}_a \\ {wtc_{avg}}_a \end{pmatrix}\), where \({wtc_{avg}}_a\) is the average of all WTC questionnaire answers divided by 100 and the other vector entries consist of the average of all relevant WTC questionnaire answers divided by 100 (e.g. \({wtc_{group\_discussion}}_a\) consists of the mean of a’s answers to all the questions containing “Talk in a small group of (…)” divided by 100).
  • The “learning styles” vector \(ls_a = \begin{pmatrix} ls_{v_a} \\ ls_{aud_a} \\ ls_{k_a} \\ ls_{rw_a} \end{pmatrix}\), where each of the entries take on either 0 or 1 as its value.
  • The “personal academic interest” response \(pai_a\), which is just a text string that user a provided for the PAI item.
  • The “partner preferences” values \({wtc_{pp}}_a, {pai_{pp}}_a, {inst_{pp}}_a, {ls_{pp}}_a\), each equal to 0 or 1 and summarized with \(pp_a = \begin{pmatrix} {wtc_{pp}}_a \\ {ls_{pp}}_a \\ {inst_{pp}}_a \\ {pai_{pp}}_a \end{pmatrix}\)

While we use the partner preferences data directly as input to the matching model, we do not use data from the other items as direct input but rather the bidirectional similarities that we get from them.

Bidirectional Similarities

For calculating a unidirectional similarity score between a for whom we want to find recommendations and a candidate b, first, we need to calculate bidirectional similarity scores between a and b for their wtc, inst, ls and pai values, where the similarity score is always a real number between 0 and 1, where 0 means absolutely no similarity and 1 means that they could not be more similar. In the following sections, we will describe how each score gets calculated.

WTC Similarity

In plain words, we say that the WTC values of a and b are most similar when their average Manhattan distance, the generalization of the number of steps needed to go from one point to another in a two-dimensional world where one can only move left, right, up or down, is 0, and that they are more dissimilar the further they are apart. Mathematically, wtc similarity gets calculated as follows: \(bi\_sim_{wtc_{ab}} = \frac{1}{1 + \frac{\sum_{i=1}^{7} |{wtc_a}_i - {wtc_b}_i|}{7}}\).

LS Similarity

The LS values of a and b are more similar the more their learning styles overlap. The similarity measure employed here is called the simple matching coefficient. Mathematically, the ls similarity gets calculated as follows: \(bi\_sim_{ls_{ab}} = \frac{\sum_{i=1}^4 f({ls_a}_i, {ls_b}_i)}{4}\) where \(f(x, y) = 1\) if x = y and \(f(x, y) = 0\) otherwise.

PAI Similarity

See section Implementation for Natural Language Processing to see how the PAI similarity gets calculated.

Inst Similarity

While the institute values may seem similar to the LS values at first glance and one could think of calculating the inst similarity the same way as the LS similarity, a closer look reveals a key difference in the character of the values. While the values in different rows (e.g. visual learner and auditory learner) in the ls vector are of no relevance to each other, this is not the case for the inst vector. Consider the following example: Imagine that a has “Anglistik,” “Sozialwissenschaften,” and “Biophysik” as their top three institutes in descending order. b has “Anglistik,” “Biophysik,” and “Sozialwissenschaften,” and a third user c has “Anglistik,” “Germanistik,” and “Romanistik.” If we were to ignore the relationship across different rows like we do for calculating the LS similarity, the institute similarity between a and b would be the same as the similarity between a and c. However, it is obvious that a’s institute values are more similar to b’s than to c’s as the first pair shares all the institutes, albeit across different rows. As we could not find an existing similarity measure that covers our case, we instead defined our own similarity mapping that accounts for such inter-row overlaps. In the following image (with \(inst_1, inst_2, inst_3, inst_4, inst_5, inst_6\) representing unique values), we list the mapping from each possible inst vector pair (except for symmetric cases, as we assign them the same similarity, i.e. we assign the vector pair (x,y) the same score as (y,x)) to its similarity score \(bi\_sim_{inst_{ab}}\) (given below the pair):

Note that as mentioned earlier, the inst vector does not have to be 3-dimensional but can also be 0- to 2-dimensional. In those cases, we treat the vector as if it was a 3-dimensional one where we fill up the rest of the vector with entries that do not match with the entries of the other vector it gets compared to, e.g. if we have \(inst_a = \begin{pmatrix} inst_1 \end{pmatrix}\) and \(inst_b = \begin{pmatrix} inst_2 \\ inst_1 \end{pmatrix},\) we would assign it similarity score \(\frac{8}{20}\).

Matching Models

We settled on 4 different matching models, including a default model that requires no training and three other models that learn to output more accurate similarity scores \({uni\_sim_b}_a\) respectively via an artificial neural network, a genetic algorithm, and a multinomial logistic regression model, all of which learn from data. The input to a model for requesting user a and a candidate b is: \({inp_b}_a = \begin{pmatrix} bi\_sim_{ab} \\ pp_a \end{pmatrix}\) where \(bi\_sim_{ab} = \begin{pmatrix} bi\_sim_{wtc_{ab}} \\ bi\_sim_{ls_{ab}} \\ bi\_sim_{inst_{ab}} \\ bi\_sim_{pai_{ab}} \end{pmatrix}\) and \(pp_a = \begin{pmatrix} {wtc_{pp}}_a \\ {ls_{pp}}_a \\ {inst_{pp}}_a \\ {pai_{pp}}_a \end{pmatrix}\). For training, we use anonymized pairs of input data of the same form as \({inp_b}_a\) and the corresponding normalized feedback (see Feedback for Matching), the latter of which we will now refer to simply as “label.” The label can take on values of \({ 0, 0.25, 0.5, 0.75, 1 }\), where 0 corresponds to the worst feedback and 1 to the best. We will denote such a pair as \({inp}_{train}\) and \(y_{train}\). We will refer to the predicted unidirectional similarity score, based on the corresponding model, with input \({inp}_{train}\) as \({uni\_sim}_{train}\). For each trainable model, we also implement the functionality to fine-tune in respect to new data after the model has already been fit. How and when we apply fitting and fine-tuning will be discussed in the section Updating the Matching Model. The model-specific details are in dedicated subsections of each model section.

Default Model

The default model, as the name implies, is the matching model used by default. Unlike the other three models, the default model calculates the unidirectional similarity score for a given user pair based on a pre-determined, fixed formula, namely \({uni\_sim_b}_a = \frac{bi\_sim_{wtc_{ab}} + bi\_sim_{ls_{ab}} + bi\_sim_{inst_{ab}} + bi\_sim_{pai_{ab}} + pref\_mean({inp_b}_a)}{5}\) where \(pref\_mean({inp_b}_a) = \frac{\sum_{i=1}^4 {bi\_sim_{ab}}_i \cdot {pp_a}_i}{\sum_{i=1}^4 {pp_a}_i}\). In plain words, \(pref\_mean\) calculates the mean of the similarity values indicated by a’s partner preferences, e.g. if a only indicated having preferences for a buddy with the same personal academic interests and whose learning style is similar, the calculation would result in \(pref\_mean({inp_b}_a) = \frac{bi\_sim_{pai_{ab}} + bi\_sim_{ls_{ab}}}{2}\). In the formula for \({uni\_sim_b}_a\), we calculate the average of all the individual bidirectional similarity values and \(pref\_mean({inp_b}_a)\), the last of which we include to adjust the similarity score according to a’s partner preferences.

Artificial Neural Network

While there are different kinds of artificial neural networks, the one we use is called a fully connected feedforward neural network, also known as a multilayer perceptron. These networks consist of layers. The input to the network is represented by the first layer’s output. After the first layer, subsequent layers involve the network multiplying a matrix with the output of the previous layer to get a vector. It then adds another summand vector to that vector and applies a non-linear function (i.e. a function that cannot be described via matrix multiplication) to the summed vector, resulting in the output of the next layer. The network can have arbitrarily many layers, with the output of the last layer being regarded as the final output of the comprehensive model. In our case, concretely, our network looks as in the following image:

The output of the first layer (corresponding to the 8 circles on the left) denotes the input \({inp_b}_a = \begin{pmatrix} bi\_sim_{wtc_{ab}} \\ bi\_sim_{ls_{ab}} \\ bi\_sim_{inst_{ab}} \\ bi\_sim_{pai_{ab}} \\ {wtc_{pp}}_a \\ {ls_{pp}}_a \\ {inst_{pp}}_a \\ {pai_{pp}}_a \end{pmatrix}\). We then multiply it with a \(8 \times 4\) matrix (corresponding to the lines originating from the 8 circles on the left), add a summand vector of dimension 4 and apply the ReLU function to obtain the output vector of dimension 4 (corresponding to the 4 circles in the middle). We then repeat the same process with a \(4 \times 1\) matrix, a summand scalar and the sigmoid function to get our final scalar output, which is the predicted similarity score \({uni\_sim_b}_a\) of the neural network. We used TensorFlow for implementation.

Fitting

As mentioned, our artificial neural network uses matrices and vectors for its calculations. The network learns by adjusting those matrices and vectors in a way that the error between its predictions and the correct labels gets minimized. In our case, we calculate the mean squared error between \({uni\_sim}_{train}\) and \(y_{train}\) for a batch of data at once and apply the Adam [11] optimizer to adjust the matrices and vectors. We perform this procedure across the entire training data set 10 times.

Fine-Tuning

Our network fine-tunes its fit when it comes to unseen data the same way it calculates the fit, except that it does so at the tenth of the original learning rate. The learning rate controls how much the network adjusts during training steps.

Genetic Algorithm

Following the typical genetic algorithm structure, the algorithm used in StudyBuddyMatch consists of seven major parts:

  • Gene: A float between 0 and 1. In the beginning, it is randomly initialized. There is a gene for each input entry and the value of the gene determines how much the input entry should influence the overall score: 1 = it should be the only factor influencing the overall score, 0 = it should have no influence at all. Genes are the constituents of the individuals (see picture below).
  • Individual: A vector of five genes, which we will call I: One gene for each entry of \(bi\_sim_{ab}\) and one for \(pref\_mean({inp_b}_a)\) (see the section on (see Default Model) for the definition). We constrain I such that its genes sum up to 1. An individual can be used to calculate \({uni\_sim_b}_a\) for \({inp_b}_a\) as follows: \({uni\_sim_b}_a = bi\_sim_{wtc_{ab}} \cdot I_1 + bi\_sim_{ls_{ab}} \cdot I_2 + bi\_sim_{inst_{ab}} \cdot I_3 + bi\_sim_{pai_{ab}} \cdot I_4 + pref\_mean({inp_b}_a \cdot I_5)\). Individuals of the population compete against each other for reproduction.
  • Population: Adjustable amount of generated individuals (we start with 100 individuals but that can be increased for more variety and lower risk of ending up in a local optimum. Decreasing the size of the population will save runtime.
  • Fitness Function: A quality measure for the individual: An individual is considered fit if the predicted output \({uni\_sim}_{train}\) is close to the label \(y_{train}\).
  • Tournament Selection: In each improvement step, the fitness is calculated for each individual in the population. This is done by applying the fitness function. The best two individuals are used for reproduction, and the worst two are deleted from the population.
  • Recombination: The children of the best two individuals are computed as follows: Take about half of the genes from the first parent, and take the rest from the second one.
  • Mutation : With a probability of around 5%, one randomly selected gene is replaced by a random value.

In the end, to produce a prediction using the genetic algorithm, we use the best individual \(I_{best}\) found during the fitting process (below) and predict \({uni\_sim_b}_a\) for input \({inp_b}_a\) as follows: \({uni\_sim_b}_a = bi\_sim_{wtc_{ab}} \cdot {I_{best}}_1 + bi\_sim_{ls_{ab}} \cdot {I_{best}}_2 + bi\_sim_{inst_{ab}} \cdot {I_{best}}_3 + bi\_sim_{pai_{ab}} \cdot {I_{best}}_4 + pref\_mean({inp_b}_a \cdot {I_{best}}_5)\)

Fitting

In the beginning, a population of size n (here we used 100, see population) is initialised by creating n random individuals. The only constraint is that the genes have to sum up to 1. Then tournament selection is conducted. Replacing the two individuals with the largest distance between output and label by the children of the best two individuals leads to a general improvement of the population. After that, with a small probability, a mutation may occur. The whole process is performed k times and after that, the best individual is stored in a variable. These weights are then used for calculating the similarity between the user and a potential match.

Fine-Tuning

The model can be improved further by providing more training data. Labeled matches are those for which a feedback given by the user is available. These new learned weights by the new data should of course not fully replace the old ones from previously learned data. Therefore, a parameter m (which has the same function as the learning rate mentioned in the Artificial Neural Network section) can be introduced. When m is high, the new learned matches should have a high impact (more exploration), but when m is low, the weights have a low impact (more exploitation). We used m=0.1, meaning new matches should not have a big impact. But this can be freely chosen and tested later on.

Multinomial Logistic Regression

There are 2 key differences in how our multinomial logistic regression models work compared to our genetic algorithm and neural network:

  1. We employ 15 different regression models, one for each possible realization of \(pp_a\) (note that we treat \(pp_a = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \end{pmatrix}\) as \(pp_a = \begin{pmatrix} 1 \\ 1 \\ 1 \\ 1 \end{pmatrix}\)). Therefore, depending on the value of \(pp_a\) of \({inp_b}_a\), we select the corresponding model from the 15. Also, we only use \(bi\_sim_{ab}\) instead of \({inp_b}_a\) as the actual model input. The motivation for having a different model for each \(pp_a\) value is that if we only had one multinomial logistic regression model for all the data, it would be weighting all entries of \(bi\_sim_{ab}\) the same way, irrespective of the partner preferences. By generating a model for each \(pp_a\) value, we can learn different weightings for \(bi\_sim_{ab}\) depending on \(pp_a\).
  2. Our multinomial logistic regression models do not compute \({uni\_sim_b}_a\) directly but instead predict the probability of a giving each of the possible feedback scores to a match with b if b was recommended to a as a study buddy (see Feedback for Matching). We sum the normalized feedback scores weighted by their predicted probabilities to attain \({uni\_sim_b}_a\). In detail, the mentioned steps are achieved as follows:

    The architecture of our multinomial logistic regression models can be viewed the same as a 2-layer fully connected feedforward network with \(bi\_sim_{ab}\) as the input, softmax as the activation function and a 5-dimensional vector of the probabilities for each of the feedback scores as the final output, which we call \({feedback\_probs_b}_a\):

    Then, we use \({feedback\_probs_b}_a\) to calculate \({uni\_sim_b}_a\) by summing the normalized feedback scores weighted by their predicted probabilities: \({uni\_sim_b}_a = \sum{i=1}^5 \frac{i - 1}{5} \cdot {feedback\_{probs_b}_a}_i\).

We used the Python package scikit-learn [12] to implement the multinomial logistic regression models.

Fitting

For each of our regression models, we aspire to minimize the sum over the whole training dataset of the following error: \(-ln({fb/5=y\_probs}_{train}) + \frac{1}{2} \| W \|^2_F\) with \({fb/5=y\_probs}_{train}\) being the predicted probability for the normalized feedback score of \({inp}_{train}\) to be equal to \(y_{train}\) and \(W\) being concatenation a of the learnable matrix and summand vector. For adjusting \(W\), we use the Limited-Memory BFGS algorithm [13].

Fine-Tuning

For fine-tuning, we fit the regression models to the whole data (i.e. including the unseen data) from scratch.

Updating the Matching Model

As previously mentioned, we have four kinds of matching models: The default model that calculates the unidirectional matching score for a user pair based on a fixed formula, and three other models that learn to output good matching scores (a neural network, a genetic algorithm, and a multinomial logistic regression model). At the beginning of our matching process, the model used for matching is the default model, as we do not have any data to train the other models yet. However, at 00:00am UTC every day, our code checks if we have gathered enough user feedback on the proposed matches to potentially pick an alternative to perform the matching. Or, if we do not select an alternative model, we check if we have enough feedback to improve the current model.

Picking a New Matching Model

For the first run, we defined “enough” data as at least 100 data points. For all subsequent runs, this was defined as at least twice the amount of data since the last run. This results in the need for more additional data the more times we select an alternative matching model. This is intentional, as we assume that the more data we have, the more the latest and currently most optimal model type is likely to remain the best performing even with additional data. Furthermore, the process of selecting an alternative matching model is a costly process:

Every time the code looks for a new matching model, it splits the available data into 10 roughly equally sized pieces. Then, we choose one of the pieces as the validation set and use the other 9 as training data on which we fit each of the 3 trainable models. Then, the error of the 3 models and the default model gets calculated on the validation set. We repeat this process for each of the pieces. In total, we get 10 validation sets and the errors of the models on each of them. We then calculate the average error across those 10 validation sets for each model and select the one with the lowest average error as our new matching model. We fit the picked model anew from scratch on the whole, unsplit data and use it as the new matching model. As such, we actually fit each of the 3 models on data 9 times the size of the total dataset, and the new model one time on the whole dataset, which can become quite computationally expensive.

Fine-Tuning the Matching Model

Again at 00:00am everyday, we will perform fine-tuning if certain conditions are met. Firstly, the procedure for picking a new model must not have been triggered already that day. If this is so, the data must also have grown by at least 10% if we had less than 1000 data points, or by at least 100 data points since the last model-picking or fine-tuning. If those conditions are met, we pass the old and new training data to our model and call its fine-tuning function. For details on how each of the models fine-tune, see the Fine-Tuning sections of the respective models.

Improving SBM’s Models with Survey Data

In order to test and further develop our algorithms, we wanted to obtain at least a small sample of realistic user data which we could feed into our database. To do this, we set up an online survey to gather initial data and feedback. Smaller goals of this initial survey also included looking at the demographic data of our sample to get an idea about the metadata of our potential users and also collect opinions of our potential user base.

Survey Structure and Recruitment

We used LimeSurvey to run our online questionnaire, which had a completion time of < 30 minutes. We distributed the link to our survey using the e-mail lists of the departments of Cognitive Science and Psychology, as well as internal study program Telegram and Whats-App groups. As compensation for participation, we provided 0.5 VP hours and the incentive to try out the matching process of our final product. We used the following text to recruit participants:

Hi everyone,
We are currently working on an app called Study Buddy Match which is supposed to match you with a study partner based on your personal preferences, communication style, academic interests and more. This is part of the SmartUni study project where we design new apps to make online university a better experience.
We need your help to develop this app. If you fill out the survey (< 30 mins), you will provide us with anonymous data which we will use to train our matching algorithm. But that’s not all! You can also sign up for the testing phase and receive a real life match based on your submission. In the second phase, you will be able to try out getting matched.
You may also earn VP hours for all of this: 1/2 for the survey alone. Link to our survey: https://survey.academiccloud.de/index.php/331742?lang=en (information provided to us will only be used for the development of the study project, and will not to be misused elsewhere)
Best regards,
The StudyBuddyMatch Team @ smartuni

We decided to run two consecutive surveys, the first one to collect data for our app and the second one to collect data on who signed up for VP hours and who signed up to participate in the final matching process. We did this to sever the connection between anonymous data collection in the first survey and identifying personal contact information provided by volunteering participants in the second survey.

On the landing page to our first survey, users had to provide their consent to our terms and conditions. They were instructed on how to fill out the survey and were told our data policy and that data was collected anonymously. On the second page, users were asked to create a personal anonymized identification code composed of the following elements:

  1. The first letter of their birth month
  2. The last two letters of their first name
  3. The first two letters of their mother’s name
  4. The last two letters of their birthplace

Participants were then instructed to store their code and use it for potential future studies. Next, we asked participants to provide demographic data. We asked for their age, home department, home institute, study program, semester, and level of studies. In the following sections, we ran participants through the questionnaires we also feature in our app: the personal academic interest questionnaire, the willingness to communicate questionnaire, the learning styles questionnaire, and the partner preferences questionnaire. We also asked which courses they found beneficial and what their respective grades were in those courses. Next, participants were asked to provide information about what time they would prefer to meet a study buddy (this item on study times was later discarded in the final app). Next, we asked for their preferred communication media, if it was preferred to have a study buddy from the same study program, and one optional question about which course(s) they had problems with. Lastly, we included two items asking them for general feedback. All questions were mandatory except for the feedback items and the optional problem course question (to review all items, see Appendix A).

We discuss the findings of this survey below, with the exception of the findings of the feedback items as these findings are discussed on the Discussion and Outlook page of this site (add link).

Descriptive Analysis of Sample

We saved and stored the data sets for all participants, even when participants did not complete all required items. However, we excluded incomplete data sets from the descriptive analyses of our sample. In total, data from a comprehensive N = 34 participants remained for analyses. Free-text items were excluded from quantitative analysis.

Table 1 shows the percentages of answers given to the main survey items. Unsurprisingly, our sample consisted for the most part of Cognitive Science students at the end of their Master’s degree, with an average age of 24. WTC scores all fell within the norm on average, with the average total scores being on the lower end of the normal spectrum. Participants had a positive attitude towards the prospect of using our application in their real lives.

Table 2 shows the percentages of answers given to the multiple choice questions. Notably, the majority marked themselves to be visual and reading/writing learners. Tuesday and Thursday were the preferred days to meet up for our sample candidates. Preferences for study partners were relatively well balanced, with “a buddy who has similar academic interests” taking the lead, followed by “a buddy who is flexible in terms of working time.” Lastly, we saw that participants preferred communication via chat apps.

Table 3 shows the percentages of answers to the VIPs survey items. As expected, we gathered substantially less complete data sets for the second survey, leaving us at N = 23. We again ended with almost exclusively Cognitive Science students in the sample, 14 of whom signed up for VP hours and 18 of whom signed up to be contacted via e-mail to participate in a later Study Buddy Match testing phase, in which they could try out the matching procedure based on the information they provided in the previous survey in exchange for providing brief feedback.

Table 1
Table 2
Table 3

Personal Academic Interests

Figure 1 presents all single categorical specifications from the free text input along with their absolute frequencies, i.e. the number of participants who entered each one.

The 61 categories are related to our study of Cognitive Science because of recruitment reasons covered in the introduction above.

Fig. 1: Absolute frequencies of free-entered academic interests, in clockwise direction.

Of all 136 words entered, more than 4 participants shared academic interests in each of the following study program’s basic modules of Artificial Intelligence, Cognitive Neuropsychology, Computer Science, Deep Learning, Linguistics, Machine Learning, Neuroscience, Philosophy and Ethics, or Programming.

12 participants matched in their interest of AI, the highest number of any category. App Development, Computational Linguistics, Education, Mathematics, Psycholinguistics, Psychology, and Virtual Reality were in second place, with 3 survey participants each. 32 out of the 61 categories were only included by one participant each, unsurprising as these were mainly specific fields, e.g. Chatbots or Interaction Design. The rest of the fields at least two participants interested in them.

This analysis reveals that users being enrolled in the same institute share a range of academic interests facilitating matching. In light of having the majority of survey participants preferring a study buddy with similar academic interests (see Table 2), this is particularly interesting to examine, even if about half of the categories were only mentioned once. However, the semantic resemblance of one input to others is not calculated in this analysis, but done by the NLP module for the final matching score.

Course History

In regard to the Partner Preference option of looking for a study buddy who has taken similar courses, the analysis of the Courses and Grades questionnaire results revealed that out of the 100 different course title inputs, the 37 courses shown in Figure 2 were taken by more than one participant. 63% of the overall 100 were only taken by one participant each, as shown in Figure 3 below.

Fig. 2: Absolute frequencies of similar courses on the Courses and Grades survey questionnaire, in clockwise direction.
Fig. 3: Courses entered only once on the Courses and Grades survey questionnaire.

With respect to Figure 2, the courses that were most frequently represented across participants, with more than 4-7 enrollments for each, are mostly basic courses, such as Deep Learning for Natural Language Processing, Einführung in Algorithmen und Datenstrukturen, Human-Robot Interaction or Human-Computer Interaction, Implementing ANNs with Tensorflow, Introduction to Artificial Intelligence, Introduction to Neuropsychology, Introduction to Logic, Machine Learning, Neuroinformatics, and Statistics with Data Analysis.

All inputs were analyzed by frequency, not by the bidirectional similarity between the individual lists of the requested 5 course entries. This analysis implies that test users share basic (and some specific) courses leading to a possible match.

Course Problems

For our Partner Preferences option asking for “a course the user’s potential study buddy should already have taken,” survey participants had to optionally “specify which course(s) they have or have had problems with.”

Figure 4 shows 15 different courses which 10 out of the 38 participants indicated to have or have had problems with. The majority of the participants did not indicate problems with any courses.

Fig. 4: Absolute frequencies of optional course problems the buddy should help with.

Mathematik für Anwender I was deemed problematic by the highest number of survey participants (3), followed by Einführung in Algorithmen und Datenstrukturen with 2 participants. The rest of the courses were entered only once, i.e. they were only deemed problematic by one survey participant.

As evaluated with Figure 2 above, Einführung in Algorithmen und Datenstrukturen was entered by 7 participants in their course history, and Mathematik für Anwender I was entered by 2 participants, meaning that there are potential matches for students who need help with these courses.

A comparison between the course problems and the course history in Figure 2 reveals more potential partners, e.g. with Action and Cognition, Advanced Topics in Deep Learning, Einführung in die Theoretische Informatik, Foundations of Logic, Introduction to AI, and Machine Learning. There are also similarities with the courses entered only once in Figure 3 (Functional Neuroanatomy and Sensory Physiology).

This analysis reveals a range of matches between entered course problems and the course history of other users who may like to help, at least for test users enrolled in the same institute.

That being said, the majority of survey participants did not prefer a buddy who has already taken a specific course.

Survey Data Migration to Data Base

To train our algorithms and to test our functions using realistic user data, we decided to make the data we gathered from the survey available inside our native data base. To migrate the data, we first performed some rudimentary cleaning steps, including deleting columns containing free text commentary and columns containing optional feedback. We also deleted 3 duplicate entries, dropping the entry with the least amount of data. To maximally exploit what we collected, we kept and processed incomplete data sets.

The goal was to make the survey data queryable through our database. Therefore, we created UserQuestionnaire objects for the four main questionnaires: Willingness to Communicate, Learning Styles, Personal Academic Interests, and Partner Preferences. For each participant of the survey, we created a new user, using the personalized abbrevations as the participant’s user names and generating a place-holder e-mail address for each. We created a UserQuestionnaire object for each questionnaire, essentially making the data provided for each item available through the user’s name and email attributes.

Our code also covered edge-cases where the provided data e.g. contained complexities such as multiple choice data or deviations in spelling of numerical data.

We used the same database infrastructure as the one we used to save our live-recorded answers. This allowed us to query particular responses to particular items. Lastly, we discarded demographic data about the institute of enrollment, level of studies, age, semester of enrollment, and the least favorite course, for the reason that we do not have direct representations of these items in our data base.

Natural Language Processing (NLP) in StudyBuddyMatch

The first questionnaire in our application, namely the Personal Academic Interests (PAI) questionnaire, relies on natural text in its answers. Each user types in their academic interests with no specific template in mind, meaning that some users submit full sentences while others submit lists of keywords.

The idea of the PAI questionnaire was for each user to freely express the academic fields they are interested in. We did not want to limit users with a pre-determined list of academic fields, because this would have restricted the matching options of users intersted in more niche fields. Thus, we planned to analyze the replies of the PAI questionnaire using natural language processing (NLP) methods.

This section describes the pipeline we developed, as well as the decisions we made, in order to come up with an NLP algorithm for use in processing questionnaire data. The goal of the algorithm is to compare free text and output a similarity score for each user pair in our database.

Different NLP Libraries

Lack of training data was a major obstacle in the development of our NLP and our overall application. We did not have a set of sample answers for our questionnaires except for answers provided by our own team, and these answers were not sufficient for training a model, and particularly not for training a language model. Because of this, we had to resort to using pre-trained NLP libraries that can take text as input, convert it to vector embeddings, compute the similarities of these vectors, and output a similarity score.

To do this, we prepared a small sample corpus of answers for the PAI questionnaire, which we wanted to use in order to compare different NLP libraries that can perform the abovementioned similarity task. We divided this task in a small team of four members, where each person chose a library and performed a similarity algorithm on the same sample corpus, which allowed us to compare the results of different libraries.

The corpus consisted of about 20 PAI sample answers (some examples from the corpus: ‘My academic interests are mainly revolved around humanities and literature.’, ‘Deep Learning applications in language’, ‘robots, human computer interaction’). The libraries we tested were sBERT, Doc2Vec, DeBERTa, and spaCy.

After comparing the results manually, we concluded that although spaCy uses a basic architecture (i.e. word2vec, rather than the more advanced transformer-based deep learning architecture), it resulted in the best similarity scores. This was illustrated in some examples, such as ‘VR and computer science’ and ‘computer science, VR’ - two PAI answers that are semantically identical but which had a lower cosine similarity score of about 0.90 in other libraries compared to 0.96 in spaCy. Another example from our corpus dealt with opposite sentences, e.g. ‘AI, linguistics, nlp, pysychology, neuroscience….’ and ‘I like anything but AI and NLP’; the only library that took the negation ‘but’ in the second sentence into account was spaCy, and thus the cosine similarity score between these two sentences was relatively low (0.4) with spaCy in comparison to other libraries.

Besides the good performance shown by spaCy to calculate similarity scores, we also considered the low dimensionality of its embeddings compared to, for instance, the 1536 dimensional embeddings generated by models such as DeBERTa. We favored low-dimensional over high-dimensional vectors since the latter increase the computational complexity when operations are performed with them [1].

SpaCy and Word2vec

SpaCy is an open-source natural language processing library developed by Explosion AI for Python and Cython. It works by converting a sentence into an NLP object, which is a Python object that can then be used to perform several common NLP functions.

Word embeddings in spaCy are computed by creating a large corpus of general English via web crawling. A model is then trained by a neural network to identify word associations and construct embeddings out of the linguistic context of each word. Thus, each word is converted into a 300-dimensional vector embedding (i.e. word2vec), which represents its meaning. Similarity scores are then computed by measuring the distance between two vectors, using e.g. the cosine similarity measure.

Limitations of Pre-Trained Models

As mentioned in the section above, in spaCy and other pre-trained models, the corpus being used for training is based on the general English language and is taken from random web pages. This means that some words used by users of our application might be out-of-vocabulary (OOV), because they are very infrequent or even non-existant in the corpus used by spaCy. When a certain word is OOV, its embedding is a 300-dimensional vector made up of zeros, which means it does not have a semantic representation.

Since the texts we are dealing with are mostly academic, it might be problematic to use a general language model, since some specific academic terminology might not exist in the model. Usually, the best practice would be to train a model based on a corpus of sample answers for our purpose. However, as we have already stated, there was no training data to work with, so this could not be done.

Implementation

We developed an algorithm that realizes the calculations of similarity scores between two users’ PAI responses. These scores are bidirectional, e.g., the similarity value between, say, answer_1 and answer_2 is the same as that of answer_2 and answer_1. Thus, given a set of n (PAI) responses and r=2 (since we choose 2 PAI responses from the PAI answers set in order to compute the scores), the NLP pipeline outputs C similarity scores (These are all the possible combinations):

\[{C (n,r) =\binom{n}{r} =\frac{n!}{(r!(n - r)!)}} \]

More precisely, the algorithm generates ten similarity scores assuming, for instance, five PAI answers in the database. This computation takes the following steps: first, the system calculates a similarity score for responses 1 and 2, also denoted “1,2”. Then it does the same for 1,3 … 1,5. Afterward, in a second iteration, the algorithm calculates similarity scores between answers 2 and 3 and then for 2,4 and 2,5 (the system repeats the same process for the remaining responses).

Pipeline Description

In order to compute the similarity scores described above, we designed a system that queries the (Django) dataset model TextQuestionAnswer and extracts the PAI responses for each user. These responses are then preprocessed: the system deletes punctuation and lowercases all characters. Even though deleting stop words (e.g., determiners, coordinating conjunctions, prepositions) is a standard procedure in most natural language processing pipelines, we did not implement this feature. We made this decision after conducting some experiments ( see Different NLP Libraries), that showed that, for our specific data, stop words play an essential role in determining the meaning of sentences — and hence the values of the dimensions of the embedding vectors. The sentence ‘I like anything but Artificial Intelligence and NLP’ provides a good example. Deleting the stop word ‘but’ could cause this user to be matched with someone who stated, for instance, ‘I like Artificial Intelligence and NLP’, even though these two users actually provided opposite PAI responses.

Once all PAI responses are in the format described above, we feed them to the pre-trained spaCy model, which outputs an object of type spacy.tokens.doc.Doc (from now on called an NLP Object). This object contains a vector attribute (which one can use to access the text embedding) and a built-in similarity method we used to compute the bidirectional similarity scores. Once the similarity scores are calculated via the NLP Object, this information is stored in the database using the PAISimilarity model.

Data for Building, Testing and Debugging the NLP Pipeline

In order to build, test, and debug the different functionalities we implemented, we needed to populate the database with some ‘dummy’ data. These data allowed us to simulate possible scenarios we will have to deal with (e.g., users changing or deleting PAI responses, new users’ answers being added to the database, and users deleting their accounts, among others). Besides, these data also helped us optimize the NLP pipeline algorithm (presented in more detail in the upcoming section) since we could recreate scenarios with up to 5452 data points (PAI answers). We considered the following criteria to choose the dataset:

  1. The data should resemble the length of the expected PAI answers (we expect users to write short sentences, e.g., ‘I am interested in NLP and AI’). Having a database that contains short pieces of text allowed us to better estimate the time it would take the system to compute the similarity scores when operating on actual data.
  2. The data should be natural language text instead of meaningless random sentences like the ones generated by, for example, the python package Lorem (which yields random text that looks like Latin). This is an important criterion since working only with random data could cause spaCy to mostly yield embedding vectors made up of only zeros as the words fed to the embedding algorithm will be most likely out-of-vocabulary.

Thus, we used the Tensorflow dataset Trec to populate the database with some natural text. Trec contains sentences with an average of 10 words that deal with various topics. As we mentioned before, we did not find any datasets with sentences that contained the kind of academic language we will be processing once the SmartUni website is released. Nevertheless, this should not be a problem since the purpose of these data was only to help in the development process (building, testing, and debugging code), as opposed to assessing the accuracy and usefulness of the embeddings yielded by spaCy to compute similarity scores.

Optimization

The system saves the spaCy-generated NLP objects to the database in order to reduce memory consumption and speed up the NLP pipeline. This way, we avoid generating these objects anew every time a similarity score has to be determined. NLP Objects are created only once per user unless the PAI answer is modified. In this case, the NLP object is replaced based on the new text given by the user (new text implies a change in the vector embedding, and hence a new NLP Object needs to be generated). We used the built-in method .to_bytes() provided by spaCy to serialize the NLP Object. Methods and attributes of the NLP object are transformed such that they can be saved as a byte string. The serialized object is subsequently saved to the database using the NLPObject (Django) model. It is important to note that the NLP object comes with some built-in methods and attributes that are unnecessary for the computation of the similarity scores (e.g., sentiment, user_data, user_data_keys, user_data_values, tensor), and these were therefore excluded from the object serialization process (and hence not saved to the database) in order to optimize the deserialization procedure described below.

To compute the similarity score between two users’ PAI responses, we must deserialize the same NLP Objects multiple times (deserialization is accomplished using the spaCy built-in method from_bytes). Thus, we used memoization to speed up the execution of the NLP pipeline as well as to avoid running the deserialization function every time the same NLP object was passed as an argument. The memoization technique we implemented allows for caching of the deserialized NLP Objects and subsequently helps decrease the load on computing resources. We used Python’s @lru_cache decorator from the functools module to develop this optimization method.

To test the effectiveness of this solution, we ran the NLP pipeline with a total of 100 user responses. We observed that when memoization was applied, the function in charge of the deserialization (deserialize_nlp()) was called 99 times with a cumulative time (time spent) of 0.047 seconds (cumtime). In contrast, when we left out the memoization strategy, the same function was called 9900 times with a cumulative time of 3.358 seconds. Thus, we managed to boost the performance of the NLP pipeline (statistics were collected using the cProfile library and the code was run on an Apple M1 chip).

NLP Data Enhancements

The values of the embedding vector dimensions can be affected if the input (PAI answer) contains spelling mistakes. For instance, the vector embedding for the (wrongly spelled) sentence ‘I like atificial inteligence’ is not the same as the embedding spaCy yields for the sentence ‘I like artificial intelligence.’ One would expect these two sentences to be assigned a similarity score of 1 as they convey the same meaning. Instead, they are given a similarity score of 0.792. Thus, to improve the quality of the data fed to spaCy we activated the spellcheck option on the HTML input field that collects the PAI responses. Further, to ensure that users do not provide random answers (i.e., a set of random characters like ‘adfasdfasdf’ and the like), we implemented a function that detects these types of responses and asks the user to provide valid data (e.g. real natural language sentences). These values are easy to detect since they produce embedding vectors containing zeros in all dimensions. With this strategy, we aim to feed meaningful data to the spaCy algorithm

This random-character detection strategy is also subject to the limitations described in the opening of this chapter: some words will be out of vocabulary just like the random characters we are trying to detect, although this is unlikely because we are using a large corpus. However, to increase the chances to match users, they must be encouraged to type in words that spaCy knows.

These two data enhancements are expected to improve the similarity scores’ accuracies, increasing the possibility that users who provide similar answers are actually matched. It is important to note that the real value of these data improvements can only be assessed once the application is launched, and we have collected enough feedback from our users regarding how accurate the matching was.

References

[1] Feuchtmüller, Sven 2018, On high-Dimensional Transformations Vectors, Uppsala University, accessed 01.20.2022, https://www.diva-portal.org/smash/get/diva2:1214427/FULLTEXT01.pdfi

[2] M. Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” 2015, arXiv:1603.04467

[3] Hichang Cho, Geri Gay, Barry Davidson, and Anthony Ingraffea. 2007. Social networks, communication styles, and learning performance in a CSCL community. Computers & Education 49, 2 (2007), 309–329.

[4] Enrique Alfonseca,Rosa MCarro,Estefanía Martín,Alvaro Ortigosa,and Pedro Paredes. 2006. The impact of learning styles on student grouping for collaborative learning: a case study. User Modeling and User-Adapted Interaction 16, 3 (2006), 377–401.

[5] Pashler, H., McDaniel, M., Rohrer, D. and Bjork, R., 2008. Learning styles: Concepts and evidence. Psychological science in the public interest, 9(3), pp.105-119.

[6] Thanh, T.N., Morgan, M., Butler, M. and Marriott, K., 2019, February. Perfect match: facilitating study partner matching. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (pp. 1102-1108).

[7] James C McCroskey. 1992. Reliability and validity of the willingness to commu- nicate scale. Communication Quarterly 40, 1 (1992), 16–25.

[8] MacIntyre, P. D. 2007. “Willingness to Communicate in the Second Language: Understanding the Decision to Speak as a Volitional Process.” The Modern Language Journal 91 (4): 564–576. doi:10.1111/j.1540-4781.2007.00623.x.

[9] MacIntyre, P. D. 2020. “Expanding the Theoretical Base for the Dynamics of Willingness to Communicate.” Studies in Second Language Learning and Teaching 10 (1): 111–131. doi:10.14746/ssllt.2020.10.1.6.

[10] McCroskey, J. C., & Richmond, V. P. (1990). Willingness to communicate: A cognitive view. Journal of Social Behavior & Personality.

[11] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” presented at the International Conference on Learning Representations (ICLR), San Diego, CA, USA, May 7-9, 2015.

[12] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research (JMLR), vol. 12, no. 85, pp. 2825-2830, 2011

[13] D.C. Liu and J. Nocedal, “On the limited memory BFGS method for large scale optimization,” Mathematical Programming, vol. 45, pp. 503–528, 1989

[14] Chiriac, Eva. (2014). Group work as an incentive for learning – students’ experiences of group work. Frontiers in psychology. 5. 558. 10.3389/fpsyg.2014.00558.