Discussion and Outlook

Contents

  1. The Future of the Core
  2. The Future of the SmartPlanner
  3. The Future of StudyBuddyMatch
  4. What We Learned

The Future of the Core

In the last phase of our development, we implemented a notification system. For this, we initially planned to also implement a pop-up system for the notifications to expand the use-cases for our notification system. However, due to some problems during the implementation and due to time constraints, we had to drop this functionality. We therefore consider implementing such a pop-up system a viable future expansion of our app.

Additionally, the notification system currently only works one-way; the user can only receive notifications of certain events from the server and interact with them to get directed to the origin of the notification. In this case, a possible future expansion could be the implementation of a chat system where users can interact with each other. This would allow users to e.g. communicate with other users which the StudyBuddyMatch service has identified as good potential study partners, without immediately having to share numbers and jump to messaging apps.

The possibility to further interact and answer these messages directly from the notification panel or the pop-up could then also be implemented. The chat function could be used by the StudyBuddyMatch or SmartPlanner applications. The function could allow users to ex. contact their study buddy or plan study events. A friending system or general contact infrastucture would also be required for this implementation. Consequently, a public profile page for each user would need to be designed and implemented.

As another possible expansion, the landing page could be changed to a dynamic page that shows all relevant information for the current day or week, including current SmartPlanner events and news from the StudyBuddyMatch application. This page might also allot space to other widgets which might also be expanded and implemented.

Furthermore, there could be a multitude of other settings included that have not been included due to time constraints and complicated implementation. These should of course be mentioned here as well, to inspire possible future work.

  • Website Language: The ability to adjust the language of the website from English to other languages could be done by providing a database to switch out strings on the website based on the selected language.

  • Dark Mode: This would simply adjust the colour scheme of the website to a more dark-toned alternative. This has multiple advantages like lower battery consumption and reduced eye strain. In addition to that, a colorblind-mode could be created.

  • Interface Scaling: Especially for users with decreased eyesight, it could be useful to increase font and image sizes automatically.

Possible Improvements to the AI Search Function

Because of the various limitations mentioned, there are several aspects of the smart search that can be improved. We identified three main possible improvements: the NLP model, dealing with misspelled words, and performance.

Because we used a pre-trained model, our results are not as accurate as they could be. For the future, the creation of a custom model is a viable improvement option. Data for training the model on program names could be obtained from Learning Management Systems like Stud.IP or Moodle, both of which are already in use at many universities in Germany. Data for training the model on academic interests could be obtained from a first deployment and use of SmartUni.

While testing the smart search, we found that spaCy could not handle misspelled words. The similarity scores dropped dramatically when words were spelled incorrectly and the response was an empty result page. This can be mitigated by using word suggestion libraries in the future. Another solution could be to normalize the final similarity scores so that not all results are removed, though the previously described solution would yield more accurate results.

Our final and biggest issue turned out to be computing performance. When a user searches for a term, the result calculation takes about 2-3 seconds. Unfortunately, because of the time limitations, we could not implement a more efficient way of calculating and presenting the results. Performance could be improved by either caching the search result object or by moving the pagination function to the frontend, thereby only loading the results once instead of on every page.

The Future of the SmartPlanner

Break Practice

In order to have a good work-life balance, scheduling and taking breaks is of high importance. To assist the user with that, we wanted to find out when it is best to take a break, how long the break should last, and what should (or should not) be done during a break.

As there is no one-size-fits-all answer to these questions, we decided to create a pop-up with a reminder to take a break. The pop-up would have shown up after a certain amount of time that could be set individually by the user. Further, we wanted to inspire the user with suggestions for how to take the break, ranging from the idea of dancing to a song or going for a walk to relaxing one’s eye muscles by looking into the distance, and many more ideas.

The image on the left shows an example of a picture with a short phrase that is meant to inspire a user. The image on the right shows an example with a longer text, in this case a fitting citation.

By clicking on the arrows inside the button at the top of the picture, the user could switch to the next and the previous inspiration. By clicking on the buttons at the bottom, the user could either continue by marking the inspiration as done, or by canceling it.

Due to lack of time, the Break Practice functionality has not been implemented yet, but would be an interesting aspect to consider in further development.

Fig. 1: These are both designs for break reminder notifications. The one on the left shows a picture with the corresponding phrase. On the right, a citation that fits the topic presented is added.

Motivational Reminders

While studying, it is important to stay steadily motivated so that one can keep on learning and stay on top of events scheduled both for personal time and for outside obligations. With our Planner, we wanted to support users with getting and staying motivated. Unfortunately, due to the lack of time, the following ideas were not implemented or considered any further. They could, however, be interesting to implement if SmartUni were to be further developed and refined in the future.

For assisting the users in staying motivated, we considered the option of motivational reminders. Motivational reminders are statements that provide a motivational message. We researched several aspects of such reminders when considering them for addition to the Planner. One was the question of how sentences should be formulated to be motivational. Further, we distinguished between intrinsic and extrinsic motivations. Intrinsic motivation comes from personal enjoyment, whereas being extrinsically motivated implies that one is influenced by outside factors.

We also wanted to find out what time is best to send a motivational reminder - before, in between, or after a task. For that, we distinguished between motivational reminders and motivational feedback. Motivational reminders remind you of what you still have to do and what you can do, ex.:

‘The hardest part is starting, once you get that out of the way, you’ll find the rest of the journey much easier’ - Simon Sinek

whereas motivational feedback reminds you of what you have already achieved, ex.:

‘Great job, you already achieved 70% of your set goals today!’

In addition to that, there is a difference between a motivational reminder to fulfill a specific task and a general motivational statement that does not point to a certain task.

To foster even higher motivation, we thought of providing gamification, a competition method, or a reward system where the user could engage more with their tasks by approaching them playfully.

Furthermore, we also discussed wanting to know which aspects a user needs motivational reminders or motivational feedback in the most, and how we could find that out. We did not continue studying this further, but using AI to fit the motivational reminders to the user’s personal needs would be interesting to add to a hypothetical later edition of the Planner.

SMART Goals

Originally, we also wanted to afford the user the opportunity to set goals in accordance with SMART goal setting. SMART is an acronym where every letter stands for a specific aspect of your goal (Fig. 2). With this, we intended to assist the user in phrasing their goals in a way that the user is more likely to stay motivated and keep up on them. We designed a SMART goals functionality, where the user can formulate their goals with the help of the specific goal formulation categories belonging to SMART. The tool assists them in creating goals in a way which will hopefully make their goal more realistic.

A first draft of a possible design of this feature can be seen below:

Fig. 2: On the top of the pop-up, there is a short introduction on SMART goals and why this system of goal setting can be helpful. On the left, the five different categories are indicated by a corresponding symbol, and the category name with a short explanation. In the time-bound field, the user can set the time frame in which they want to accomplish the goal. The white fields on the right are the input fields. The questions provided are the default values for the text area field, meaning they can be removed and overwritten by the user with their answers. When the input fields are filled out, the user can click on the ‘Start Planning’ button and have the SMART goal included in the task section.

The SMART goal functionality is not implemented yet. However, for further work, it would be interesting to integrate not only the option to plan a task according to the five SMART categories, but also to apply the SmartPlanning process on those tasks. More precise information regarding the workload of bigger projects and the approximate time for completing them would be needed for this full integration. With a 30-hour project, for example, one would need to know how to split the project up and plan smaller timeslots to cover the entirety of the work.

The Future of StudyBuddyMatch

Indirect Testing of Skills

In our web application, we often ask the users to rate themselves. An example for that is in the Learning Style questionnaire, in which the users have to state what kind of learner they are (auditory, visual, etc.).

The problem of self-rating is that it reflects the opinion of the user about themselves, which might not fit to reality. There are people who tend to under- or overestimate themselves and this could lead to a problem in our app. Another case could be that the user claims to be a reading learner but in reality is a writing learner. As a consequence, the similarity measures would be correct in theory but the match would not be good in reality.

It could also happen that the user claims to be interested in programming and therefore might already have some knowledge, but in reality the user is not interested in programming at all. They just thought that it sounds good for a potential matching partner if their personal academic interest entails programming. If they later meet up, the other user might see that they are not as similar concerning their personal interests as expected.

One suggestion that could solve this problem is not letting the user rate themselves but instead using an objective evaluation. This could be a standardized online test measuring to which extent the user is which kind of learner. We could include an already existing online test and include it in our app. The results could then be used later in our app for the matching algorithm.

Another idea is including an online math or coding test if those skills are relevant for the user searching for a study buddy. We already collect course history data, but only looking at the course history for coding classes could be inaccurate as there are sometimes more basic courses and sometimes more advanced courses. These levels of difficulty are not discriminated now in our app. We just look at the institute the course comes from. In each institute, there are more basic and more advanced courses and the hard filter (see the Hard Filters section of the Study Buddy Match Architecture page) only requires a 10% enrollment of all possible courses to consider a user as coming from that institute. Those 10% may just be the easiest courses from that institute, so saying a user is from a certain institute might not give an accurate impression of their actual skills.

For preventing such false ratings/evaluations, we could introduce math, coding, or language tests depending on what is relevant for the user searching for a study buddy. This could give a better impression of the study buddy’s skills and therefore improve our matching.

Dimensionality Reduction: NLP Package

We did some research on dimensionality reduction (e.g., singular value decomposition, principal component analysis) to scale down the high dimensionality of the vector embeddings yielded by the spaCy model ( see the natural-language-processing (NLP) section in the StudyBuddyMatch Architecture section). The main objective was to decrease the number of dimensions of the feature space while preserving the essential information contained by the embedding vectors with 300 dimensions. Implementing this technique could improve the performance of the NLP pipeline as the similarity score calculation would be done using smaller vectors. This, in turn, would translate into a reduction in the consumption of computational resources. As it is noted by Raunak, et al. (2019) [1]“a major issue related to word embeddings is their size, e.g. loading a word embedding matrix of 2.5 M tokens takes up to 6 GB memory (for 300-dimensional vectors, on a 64-bit system).”

We did not have any data available ( see the NLP section in the StudyBuddyMatch Architecture section) to evaluate how a lower dimensionality would affect (either positively or negatively) the quality of the embeddings in terms of the meaning they capture. For example, a loss in semantics could lead to poor similarity scores and hence useless matching recommendations. Therefore, we decided not to proceed with this methodology at this time. Future work can be done in this regard once the SmartUni application is launched and data are acquired to test the effectiveness of this approach.

Text Clustering in NLP

The idea of collecting user’s Personal Academic Interests (PAI) was based on the assumption that people who have similar academic interests may work well together. The problem we had to solve was how to automatically identify users with similar interests based on their text inputs. Besides calculating similarities between sets of two users directly, we also tested an alternative - text clustering.

Clusters are essentially simply groups that contain similar objects. By applying clustering, ideally we want to put users into clusters which represent different interest groups, e.g. Philosophy, Neuropsychology, etc. In other words, we try to group our text documents based on the most important terms that comprise them. For this purpose, we used K-means combined with TF-IDF vectorization.

K-means is one of the most well-known clustering methods in existence. It is an unsupervised algorithm that returns a cluster assignment to one of k possible clusters for each object. Before we apply K-means, we needed to do some preprocessing and to convert the unstructured text submission into numerical format/vectors. In our example, we used the TF-IDF vectorizer from the Scikit-learn library. TF-IDF stands for term frequency-inverse document frequency, a weighting scheme that is commonly used to measure how important a word is to a document in a corpus/collection of texts (for more about TF-IDF, see sklearn.feature_extraction.text.TfidfTransformer).

The steps of our clustering procedure were as follows:

  1. Apply preprocessing to remove stop words (i.e. the most frequent words that have little lexical meaning on their own, such as “the,” “be,” and “to”), and symbols which have no actual meaning and thus add no value to our model.
  2. Use TF-IDF to turn the data into vectors.
  3. Apply K-means to group the data.
  4. Reduce the dimensionality using PCA and visualize the result.

One important issue with applying K-means is that we need to determine the optimal number of clusters k. The most popular method is the Elbow Method, which is a visual technique to spot the so called “elbow” point in the curve where the explained variation slows down after changing rapidly for a small number of clusters. However, the visual method is not sensible in our use case, since our database will change dynamically and we cannot draw a plot every time we have new data points. A more promising approach is the Silhouette Score. The Silhouette Coefficient is a measure of how similar a data point is within its own cluster compared to other clusters. The value ranges between [-1, 1], where a high score indicates that the data point is well matched to its own cluster and far away from the other clusters. We can select a range of candidate k values and then train K-means clustering for each of the k values. For each K-means clustering model, we do the silhouette calculation by computing silhouette coefficients for each point and averaging them out for all the samples, and then we pick the k which leads to the highest silhouette score.

The text clustering is not implemented at this time since we do not have a considerable amount of data. The small example dataset that we tested on does not suffice to demonstrate whether such an approach would work well on actual dynamic data.

Automatic Language Recognition

After our second usability test, we realized a potential problem with the NLP processing of PAI answers. The model we used in the pipeline is constructed for processing English text, but although our website and its instructions are all in English, we might still get German text input from users even though we specifically state that the answer is required to be in English.

To deal with this inconsistent language issue, we did some research on language detection tools. We tested the langdetect and spacy language detector tools on some example cases. Both models worked quite well for long English sentences. However, langdetect had problems with recognizing single word abbreviations such as “NLP” and “AI.” The spacy language detector performed better and for further improvement of the NLP pipeline, it can be applied and a translation model can even be considered.

Other Possible Data Collection

Concerning additional data collection, participants in our test-data survey had the opportunity to suggest module improvements. One idea was a further Partner Preferences or Learning Styles option for group work in general, i.e. with more than one suggested buddy, or even for spontaneous group meetings on a particular topic that one single user or the matching couple could suggest at a given time and place. Other people could send a request to join, and if the initiators confirm, they could get in contact.

We think group work would be a good further option. Spontaneous meetings cannot be implemented in the matching algorithm, but could instead be realized as an additional function on the matching page.

Furthermore, survey participants expressed interest in more questions, such as:

  • Do you prefer to befriend your buddy or to separate that from “business”?
  • Do you prefer a buddy who speaks your mother tongue?
  • Does your buddy need to be patient or do you want to work in high-speed?
  • Do you prefer a buddy who goes through every detail or just skims everything?

If users prefer to befriend their buddy, participants in the survey suggested adding questions about non-academic interests, such as their favorite cuisine, movies, books, and personality preferences.

In our opinion, the question of a friendly or only work-related match preference is also of importance, since so far it is not guaranteed whether the proposed buddy who has taken a specific course and could help is at all willing to do so, if not for friendly reasons. Regarding the other three proposed preferences and learning styles, we also think that these would be good additions to improve the matching result by including more needs of the student users.

What We Learned During SmartUni

Every member of the SmartUni team came away from the project with new skills and strengths, both technical and personal. Below is a (necessarily incomplete) list of what we’ve gained personally in our year of work:

  • Project management skills
  • Teamwork skills
  • git framework skills
  • GitLab experience
  • Scrum/Agile development skills
  • CSS
  • HTML
  • Javascript
  • Django
  • SQL
  • Writing skills
  • Communication skills
  • Database management and design skills
  • PyCharm
  • Confidence
  • Creativity

References

[1] Vikas Raunak, Vivek Gupta, and Florian Metze. 2019. Effective Dimensionality Reduction for Word Embeddings. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 235–243, Florence, Italy. Association for Computational Linguistics.