The Core

Contents

  1. Custom User Profile
  2. Database
  3. Notification Framework
  4. Logging
  5. Bug Reporting
  6. Mobile View Optimization
  7. Settings Page
  8. AI Search

What is the Core?

The Core module forms the base of SmartUni and provides basic information and functionality for the website. This ranges from the user profile and associated authentication processes to the notification framework. The Core further handles more developer-directed features like the logging framework and the bug report and review system, which are very important for administrators of the website. The following sections will detail the elements of the Core and how they were realized by the team during development.

Custom User Profile

The Core stores basic information about the user and provides this information to StudyBuddyMatch and the SmartPlanner. Early during the development process, we had to decide where to include this information in the database. After some research, we were presented with three options on how to include this information in our user model:

To store the additional information, a new Django model was created that relates to the standard user model via a one-to-one link. This strategy results in additional queries or joins when we want to retrieve the related data. Anytime we want to access, change, or create the related data, Django will have to execute additional queries and functions. In the following example, we store the additional information in the Profile model.

class Profile(models.Model):
  user = models.OneToOneField(User, on_delete=models.CASCADE)
  # put additional info as fields here

# this function creates the profile model upon registration/account creation
@receiver(post_save, sender=user)
def create_user_profile(sender, instance, created, **kwargs):
  if created:
    Profile.objects.create(user=instance)

Option 2: Creating a Custom User Model by Extending the AbstractBaseUser Class

For this option, we would be extending Django’s AbstractBaseUser class and adding the desired information as fields to the class. We can further specify which name should be used as the username field and which fields are required. Additionally, we would also have to define our own authentication functionality by creating a custom UserManager class. Here is an example:

class User(AbstractBaseUser):
  avatar = models.ImageField(upload_to='avatars/', null=True, blank=True)
  # ...

  objects = UserManager()

  REQUIRED_FIELDS = ["username", "first_name"]

Option 3: Creating a Custom User Model by Extending the AbstractUser Class

The third option is very similar to the second option. Here we are also including the additional information by extending an already existing class. In contrast to the AbstractBaseUser class, however, the AbstractUser class already provides the authentication functionality. Therefore, this option is a perfect fit if you are looking to include more information in the user model and you don’t want to change any authentication functionality that Django already provides. For example:

class User(AbstractUser):
    bio = models.TextField(max_length=500, blank=True)
    location = models.CharField(max_length=30, blank=True)
    birth_date = models.DateField(null=True, blank=True)

Option 2 and 3 both require that the custom user model is specified in settings.py:

# example if new user model is located in core/models.py
AUTH_USER_MODEL = "core.User"

We ultimately decided to use Option 3 since we wanted to include the additional student/user information in the user model itself, but didn’t need to change the way Django handles authentication.

The custom user model now stores the following information:

  • From AbstractUser
    • Username (username)
    • First Name (first_name)
    • Last Name (last_name)
    • Email (email)
  • Custom Information
    • Profile Picture (pic)
    • Institution of Higher Education (institutions)
    • Study Program (programs)
    • (Notification) Settings (settings, as a ManyToMany field)
    • On-site or Remote Study (on_site)
    • Bio (bio)
    • Phone Number (phone)
    • Messenger Used (messenger)
    • Languages Spoken (language)
    • Timezone (timezone)

Much of this information was displayed on a user detail page reached through the link /buddy/<pk>, where pk is the primary key of the user model. We included this detail page for users to have an option to learn about each other and contact each other. The image below displays an example of the user detail page.

Database

The database is one of the first components we focused on in the early design phase because a properly designed database is crucial when it comes to accessing up-to-date, accurate information with any application.

To paint a picture of how a database works, think about when you shop online and want to have a look at a specific product. Typing in keywords such as “black dress” causes all the black dresses stored on the website to appear in the browser you are looking at because the information “black” and “dress” is stored in the database entries for each dress.

In general, a database is a collection of data and information that is stored in an organized manner for easy retrieval and can be accessed by multiple users with optimal speed and minimal processing expense. It is used to store user information, session data, and other application data.

The database is the central repository for all of this data. Web applications use a variety of databases to store data, such as flat files, relational databases, object-relational databases, and NoSQL databases. Each type of database has its own advantages and disadvantages when it comes to storing and retrieving data. For our web application we use a relational database, called MySQL.

Relational databases allow you to store data in groups (known as tables) through their ability to link records together. They use indexes and keys, which are added to data to locate information fields stored in the database, enabling you to retrieve information quickly.

Our goal was to have a database that met our needs and could easily accommodate change. In order to do that, we took the following 3 steps:

First, we asked ourselves what information we needed to store. Then, we divided that information into the appropriate tables and columns, where a table represents a class that we need to store information about (like a user) and columns represent the attributes for the class (like a username). Last but not least, we defined how those tables relate to each other.

The final database structure, aka the “database schema,” can be seen in the figure below.

Fig. 1: Entity Relationship Diagram (ERD)

The above image is an Entity Relationship Diagram (ERD) that illustrates the tables, fields, and relationships between different tables.

Database Terms to Know

A “primary key” is a special relational database table column (or combination of columns) designated to uniquely identify each table row. Each table can only have one such column.

Each row is called a “record” and each column a “field.” A record is a meaningful and consistent way to combine information about something. A field is a single item of information that is present for each record. One can think of a field as an attribute of what the table represents. In our SmartUser table, for example, the primary key is the ID of the user and the table contains information such as the user’s name, date of birth, email, and so on in fields which are filled out for each user in their individual record in that table (i.e. each user has one row in this table which contains their ID, name, etc.).

Some columns are called foreign keys. A foreign key is a column or set of columns in one table that refers to primary key columns in other tables.

The two types of keys link the entity represented by the primary key to another entity represented in a different table.

You can represent three types of relationships with the help of primary and foreign keys:

1) One-to-One

A one-to-one relationship is a relationship between two tables where each table can have only one matching row in the other table. A real-world example would be Social Security numbers, which can only be assigned to one person at a time. This case does not occur in our database.

2) One-to-Many

One-to-many is the most common relationship, in which one entity in one table can correspond to multiple records in another table, but not the other way around. This case is used in SmartUni’s database. For example, a user can provide multiple feedback submissions, but each feedback submission relates to only one user. The one-to-many relationship is similar to the one-to-one relationship except that it allows multiple matching rows in one of the tables.

3) Many-to-Many

In a many-to-many relationship, each side of the relationship can contain multiple rows. This case is also used in SmartUni’s database. For example, each institution has many users enrolled, and each user could be enrolled in more than one institution at a time.

To maintain a many-to-many relationship between two tables in a database, the only way is to have a third table which has references to both of those tables. This table is called a “through” table and each entry in this table will connect the source table (ex. SmartUser) and the target table (ex. institution).

In the SmartUni database, the relationship betwen the institutions and the programs they contain is also many-to-many, as the same program could belong to more than one institution and institutions can have more than one program. Since our “through” table connects all three (users, institutions, and programs), we called it UserInstitutionProgram.

Django helps abstract out all of this complexity and provides a simple interface to the person writing code.

Relationships in Our Database

In figure 1, each box corresponds to a table in our database and the edges between them are the relationships.

A black node on one end of the edge denotes a one-to-many relationship, two black nodes on both ends denote a many-to-many relationship, and the arrows represent inheritance, which means we extended and customized a table that Django offers out-of-the-box.

As shown in the ERD, we did that twice, once from AbstractUser with our custom table SmartUser and a second time from AbstractNotification with our custom table Notification.

As mentioned earlier, a many-to-many relationship should always be broken by a third table (the through table). When such a table is not created explicitly, Django will produce it on its own. In this case, the through table will not show up on the ERD.

Here is a closer look at the relationships in our database:

Tables with a One-to-Many Relationship:

Notification and module :

Each notification belongs to one module, but a module can belong to many notificaions.

The same goes for the following relationships:

SmartUser and notification

SmartUser and feedback

SmartUser and bug

Feedback and page

Bug and page

Tables with a Many-to-Many Relationship:

SmartUser and language

SmartUser and messenger

Institution and program

In some cases, you will need to store extra data about the relationship between two model instances linked by a many-to-many relationship, which is why we created our own through table UserSettigs for the User and Settings tables, where we saved a user preference to our settings. In other words, the user can choose how they get notified and they have 4 options - in app, via email, both, or none at all.

Since the UserSettings through table is manually created, it is visible in the ERD.

We also created another through table manually, namely UserInstitutionProgram which we discussed above, because we had a many-to-many relationship between three tables - SmartUser, Institution, and Program.

Models

All of the tables are created programmatically in the Models.py file of a Django-based application (apart from built-in tables handled by Django itself in the background). Generally, each model in that file maps to a single database table.

The Core includes multiple models governing the functionalities, aspects, and data structure of parts of the application ranging from the user profiles to the in-app notifications. The following is an accounting of the models used and a (brief) description of their key features and functions:

SmartUser – This is SmartUni’s basic representation of a user. It includes the standard information such as the user’s name, date of birth, email and preferred messaging app, and profile picture. It also includes more specialized information needed to best adapt the SmartUni service to them, such as the courses they’ve taken, the institutions and degree programs they have attended or are currently enrolled in, the languages they speak, and their timezone (particularly important considering remote learners may be based in numerous different timezones). Note that this model makes use of several supporting models targeted, as their names suggest, at specific data points in the user model – Messenger, Language, and Timezone.

Settings and UserSettings – These models are primarily used to allow the user to input their desired settings for the basic website (including collecting the information contained in the SmartUser model), for the SmartPlanner, and for StudyBuddyMatch.

Institution – This model contains information about the institutions users have and are currently attending. Each instance of this model can contain the institution’s name and the programs available there.

Program – This model contains information about the actual degree programs users have and are currently completing. Each instance of this model can contain the program’s name and the institution where it is available.

UserInstitutionProgram – This (through) model stores the connection between the user and the chosen institution and program combination for this user.

Page – This model contains information about the pages of the SmartUni website and their URLs, allowing for a central reference for the site’s pages which can be used to ex. keep track of what page of the site a user has previously been on for auto-filling in certain forms like the feedback form.

Module – This model functions as a simple list of SmartUni’s three principle components – its Core or base platform, the SmartPlanner, and StuddyBuddyMatch. This information is primarily used to populate a module list in forms such as the feedback form.

Feedback – This model collects data related to users’ feedback submissions about SmartUni. This includes the user providing the feedback, the module they are providing feedback on, the submission time of the feedback, its status in the review process, and of course the feedback itself.

Bug – This model collects data related to users’ reports about bugs in the SmartUni website or applications. This includes the user providing the feedback, the module they are providing feedback on, when the bug report was created and resolved, and of course the description of the bug itself.

Notification – This model is the basic representation of the in-app notifications used in SmartUni. It uses the same fields as the AbstractNotification model from the django-notifications-hq package and adds some necessary custom fields, such as one for the SmartUni module the notification is coming from. The fields it inherits from AbstractNotification include the content of the notification and its creation time. Further detail on the notification framework of SmartUni can be found in the Notification Framework section.

Notification Framework

For the notification framework, we decided (after thorough discussion regarding different possibilities) to use the django-notifications-hq package [1], as it included all the necessary components and options we wanted to use for our notifications.

After we read through the documentation, we started by adding a notification button, with a live updating numeric indicator, which updates every 15 seconds, to the right side of the search bar. For this, we used the API-call provided by the package.

As we wanted to be able to interact with the notifications instead of only displaying them, we further implemented and designed a dropdown menu that appears when clicking on the above-mentioned button. We followed our global design principles here, while also providing a visual indication of the content and the origin of the notification. We decided to use the module.id and the description fields of the data entry for each notification for that purpose, presenting the origin of the notification using the module ID and its contents using its description.

In order to be able to interact with the notifications, we added links to the individual notification elements in the dropdown-menu.

Once a user clicks on a notification, they get redirected to the origin of the notification.

The notification is set to “read” in the process to prevent it from being displayed again on the main page.

Additionally, to interact with all notifications, we created a more detailed overview page for the notifications that displays the module ID, time of the notification, and its read status. Further, we added a button that redirects to the origin, similar to clicking on the notification in the dropdown-menu. This also triggers the change of the unread status, if it was not set to read already.

Pop-Up

Furthermore, we wanted to create a pop-up notification that gets displayed at the bottom right of the screen once for every unread notification. The idea is for the notification to be displayed for 5 seconds, after which it would disappear, and after a current set minimum of 0.5 seconds the next unread notification would be displayed.

This was done to be able to draw the attention of the user to the new pop-up notification, as without such a pause, the notification would not ‘pop-out’ as much and would be less noticeable.

In order to only display every notification once and to get the corresponding correct notification information, we tried to implement an AJAX-call in the function, as we cannot access the database from JavaScript itself. This call would then be triggered every 15 seconds and would check if there are notifications in the database which have not been presented yet.

The AJAX-call then would change the status of verb, a string data entry that we used to set the pop-up status to 1, meaning that the pop-up was displayed for the particular notification.

However, this did not work reliably and in turn reading the data entry to decide if the pop-up should be displayed at all similarly did not work consistently.

The database entry did not change, even though the pop-up message had been displayed correctly.

Therefore, the pop-up would either display constantly or not at all.

Because of the above-mentioned issues and hurdles, we had to drop that feature. However, as it would be a nice feature to have for our website, we would like to add this in the future to our project.

Notification Settings

In order for the user to control the behavior of the notifications, we created a settings page for the notifications where the user can select if they want to receive notifications via e-mail, in-app, both, or not at all. This was created as a subsection in the general settings page.

E-mail Notifications

We had to configure and add new code to give users the option to receive notifications via e-mail. Specifically, this required configuring the Django SMTP connections, setting up an app password for the e-mail provider, and writing the code to send the actual e-mails containing the notifications through Django.

First we tried Gmail as a provider, but we kept getting an authentication error, so we switched the e-mail provider to UOS (uni-osnabrueck).

We can send e-mails using Django’s built-in send_mail() function. It can send plain text e-mails and HTML e-mails. For the HTML e-mail, we can use the same template language that Django utilizes for web pages. This way, we can customize the email to the user we send it to. The send_mail() function sends both the plain text and HTML version of the email, as is now standard. Usually the HTML version is displayed to the user since most email providers and programs display this version of the email by default. The design of the HTML email is based on the general design of the website.

Logging

SmartUni Logging:

We decided to implement logging for two main reasons: debugging, and monitoring the application. Based on Python’s logging module, Django offers its own module, which provides default configurations. However, the default configuration writes any log messages to the console/terminal, where targeted search is very difficult. In order to be able to access the log messages even days later and debug or analyze the execution of our application, we needed the log messages to be saved in a more permanent way, and to be easily searchable.

Consequently, we customized the logging configuration in the SmartUni application settings via the logging dictionary provided by Django. The dictionary is structured according to JSON standards with key-value pairs and configures loggers, handlers, formatters and filters. More on Django’s extended logging module can be found in Django’s documentation.

SmartUni Loggers and Handlers

We set up a general SmartUni logger and three general handlers so that all log messages that aren’t handled by specific loggers and handlers are still picked up and processed.

The fileInfo handler writes log messages of the info level and higher to the specified log file, while the fileDebug handler does the same for messages of the debug level and higher. That means that the fileDebug handler logs every message, since it is the lowest log level.

For log messages of the critical level, we set up the AdminEmailHandler from the Django logging module. With this, any critical log records are both recorded to a file and sent to the application admins. Admins can be specified in the application settings as tuples in the form of ('name', 'email') in a list called ADMINS.

SmartUni Formatters

Log records are formatted in two different ways: verbose and simple, which are also the names of our formatters. Simple log messages have the following structure:

INFO 2022-05-05 10:31:46,608 New user with id 1 successfully registered.

The parts of the log record are as follows: log level, date and time, log message, and stack trace if available. In this example, the log level is INFO, the date and time 2022-05-05 10:31:46,608, the log message New user with id 1 successfully registered., and it has an empty stack trace.

Verbose log records add the logger name and module name to the log record as follows: log level, date and time, module name, logger name, log message, and stack trace. An example can look like this:

ERROR 2022-04-22 10:33:55,160 django.request log Internal Server Error: /bugs/1
Traceback (most recent call last):
[...]

django.request is the module name and log the logger name in this example.

For our application, the simple formatter is used for info level log messages, whereas the verbose formatter is used for debug level log messages to provide more details during debugging.

SmartUni Filters

Regarding the filters, Django provides the require_debug_true filter. With this filter applied, log messages are handled by the debugging handler only if the DEBUG variable is set to true in the application settings. During production and in our development environments - so while we were developing the SmartUni website - DEBUG was set to true. For the deployed SmartUni website, this is set to false. Because a website will display detailed error messages, decrease performance, and list information about the website’s settings when DEBUG is set to true, a deployed website should not be in this mode.

Timed Creation of New Log Files

Because log files can grow to a very large size, which can result in low performance loading the file and searching for specific entries, we decided to use the TimedRotatingFileHandler. This is a built-in handler in Django that allows for creating a new log file automatically at specified time intervals. For SmartUni, we decided to create a new file at midnight every day.

StudyBuddyMatch Logging:

As more modules and apps were added to the SmartUni page, tracking log messages during the development and debugging phase became more difficult since all logging data of the website modules were exported to the same files. Further, it is suggested in the Python Documentation to create different loggers for different modules in the application. Hence, we implemented a separate logger so that all StudyBuddyMatch (SBM) logging records were saved to a specific file: sbm.log.

The SBM logger has its own handler, file_sbm, that uses a separate formatter, sbm_formmater. Each LogRecord has the same structure as the example below:

``

ERROR 2022-07-21 13:35:34,708 sbm signals 10 Similarity score is not within the range [0,1]. Saving failed while attempting to save, similarity = -0.1422656759485355, user_a: ddavis, user_b: testuser

``

The LogRecord above starts with the level name (e.g., ERROR), followed by the time when the LogRecord was created (e.g., 2022-07-21 13:35:34,708), name of the logger (e.g., sbm), module name (i.e., python file from which the logging call was made e.g., signals), line number (in the example, the logging call was made from line 10) and the message.

Bug Reporting

In our web application SmartUni, we developed a bug reporting module. A bug reporting module is important to build a healthy communication bridge between our website’s users, developers, and testers. This ensures that our web application is kept bug-free and offers the best experience for our users.

Before going deep into SmartUni’s bug reporting module, we’ll first review a few basic concepts in bug reporting systems and tools.

What is a Bug?

Digitally, bugs can be any fault in the design, specification, code, or requirements of a website that creates issues and prevents tasks from running correctly.

Bug Reporting Module

The act of reporting the specific error that you encounter in a piece of software is called “bug reporting.” Bug reports are what developers use to record an issue, replicate it, troubleshoot it, and document how to fix it.

Elements of an Effective Bugs Report

Good bug reports tell developers exactly what needs to be fixed and help them get it fixed faster. As the chances of the bug being fixed promptly and correctly are directly related to the qulity of the bug report, a bug report should be clear and concise without any missing key points.

SmartUni Bug Reporting Module

The SmartUni bug reporting module encompasses three pages:

  1. Report a Bug
  2. Bugs List View
  3. Bug Detail View

The visibility of these pages depends on the role the user has. All users can access the Report a Bug page, but the list and detail bug pages are only visible to admins.

Report a Bug

This is a fairly simple page that contains only two fields and a button. The first field is a dropdown list of all the pages and modules, allowing the user to indicate the service effected by the bug. The second field is a text area to explain what the bug is and when it occurs.

Fig. 2: Screenshot of the "Report a Bug" Page

If a user was on one of the modules’ pages when they clicked on the ”Report a Bug” link, then the first field will be automatically filled in for them, as we assume they want to report a bug found in the respective module.

After filling the fields, they hit the Submit button and the bug is reported for the admins to address later.

Bugs List View

This page is meant for the admins of our web application and contains a list of all reported bugs. The main and only component in this page is a table of four columns with a button at the end of each row.

The columns in the table are ID, State, Module, and Description:

  • ID: automatically generated
  • State: represents the state of the report, which can be one of three options (open, in progress, or closed)
  • Module and Description: entered by the user

The module and description, since they are the values that the user submits in the report, remain unchanged. The only value that is changed during the process of addressing a bug is the state of the report. The button “Details” which is found at the end of each line links to the Bug Detail View page.

Fig. 3: Screenshot of the "Bugs List View" Page

Bug Detail View

When an admin presses on the “Details” button for one of the bugs listed on the bug list view page, they will be redirected to a new web page which includes all the bug’s details. Both the bug reports themselves and the bug detail view are visible only for admins. As mentioned in the introduction, the details of the bug reports should be clear and concise without any missing key points. The chance of the bug being fixed well is directly related to the quality of those details.

Elements of The Bug Detail View.

  1. Module: The module name where the bug came from.
  2. State: there are 3 states:
    • Open
    • In Progress
    • Closed
  3. Submitted by: the username of the user who reported the bug.
  4. Submitted at: the time when a bug was reported.
  5. Description: the bug description written by the user.
  6. Related log file: the attached log file uploaded automatically by the system when the user reported the bug. The log file contains logs created by the user local machine “client-side.” The admin can press the download button and download the related log file.

Enabling and Viewing Client-Side Log Uploads

We enabled client-side logging and automatic upload of end user log files. The log files of the users are pushed automatically to the system. Then, admins can view the uploaded files through the SmartUni admin dashboard. Enabling and viewing client-side log uploads keeps track of every event that happens in SmartUni from the minute the end user starts running it to the second they stop it. Any calls the user makes to third party APIs or any scripts that run in the background will have a record here. This is an essential source of information about everything that happens behind the scenes of the SmartUni application and is invaluable in tracking down where reported problems originate from.

Mobile View Optimization

As we wanted to allow users with different screen sizes to access and use our site, we had to optimize our website for different screen sizes. For this, we thought of using a grid system, which reorders and resizes elements based on certain thresholds of pixels on the screen of the user. We decided to use three different categories for this: a mobile, a tablet, and a desktop view size.

The determining factor for the threshold is the horizontal number of pixels. Using this grid system, the page is divided into different columns. Each category has a defined number and size of columns and the content gets adjusted accordingly. In addition to the basic build-up of the website, some elements like the header or the notification-dropdown menu are automatically adjusted based on the same grid system. CSS files were used to apply the mobile optimization to some elements which do not automatically get adjusted with the grid system.

Grid System

For the grid system, we used the built-in implementation of the Bootstrap grid system [1]. This system includes five different element types, of which we used sm (small), md (medium), and lg (large) for our three views respectively. The elements are aligned in a column system where up to twelve elements can be placed in a maximum of twelve columns, dependent on their size. For ease of use, there are already some predefined classes which we used to size the elements in this system. An automatic line break in the elements in the grid system is applied once one of the below-mentioned view types is displayed.

Mobile View

The mobile view is displayed if the number of horizontal pixels is less than 768. In the mobile view, the sidebar gets collapsed into a hamburger-style menu and the top bar is reorganized. The sidebar can then be accessed by clicking on the hamburger menu icon. The sidebar can also be collapsed when clicking on the hamburger menu icon again. The search bar and the login button are put under the header. Additionally, the elements are reorganized based on the grid system to allow touch interactions to be done more easily.

Fig. 4: SmartUni Mobile View

Tablet View

The tablet view is presented when the number of horizontal pixels is between 768 and 1035. In this view, a line break is added in some areas to improve usability. One example is the headline of the start page, visible in Fig. 2. Here also, the elements get adjusted based on the grid system.

Fig. 5: SmartUni Tablet View

Desktop View

The desktop view is presented when the browser window or screen size is bigger than 1035 pixels horizontally. In this view, the website is presented without any additional mobile optimizations. There is no additional line break and the elements of the grid system are presented unaltered.

Fig. 6: SmartUni Desktop View

Settings Page

A platform like Smart-Uni has a lot of adjustable options that can and should be adjusted to the users’ needs and wants. To provide the user with these options, the Settings page exists. It can be accessed by clicking the ⚙ symbol in the top right corner of the webpage.

The Settings themselves are separated into two segments, the first being the profile information and the second the notification settings.

Profile Information

The profile information can be separated into multiple subgroups:

Personal Information

The users can edit how they are portrayed on the website by editing their name, profile picture, and age. Other personal information like their date of birth and e-mail address can be edited here as well.

Study Information

In this section, the users can adjust which course of study they are currently studying in. The list of courses depends on the selected university and has been collected from their websites.

The universities that have been included so far are:

  • FH-Westküste

  • Universität Osnabrück

  • Universität Oldenburg

  • Universität Duisburg-Essen

  • Universität Heidelberg

  • Universität Flensburg

The user can indicate whether they study on-site, meaning on the campus, or remotely, most of the time.

Contact Data

A phone number can be added to make the first contact between students matched via the StudyBuddyMatch service easier. This phone number is checked by the regular expression ^[(+?\d{2})|0]\s?[\d\s]{8,16}$, to see if it matches the typical patterns.

Additionally, the user can input which messaging services they regularly use. This is later shown to other users to provide a common platform of exchange.

Biography

In this section, the user is asked for a small biography. We decided for a text field for this purpose, where the user can input any text they want.

Language

Since not all students have German as their primary or only language, they can also indicate which other languages they speak. This ensures the possibility of communication between matched users.

Timezone

With 320,000 international students in germany alone, we found it important to adjust for the timezones they might live in when being with their family back home. In this setting, a user can select the timezone they are currently in. This adjusts the timestamp for admin messages and feedback and bug report submissions. Unfortunately, adjusting the schedule created by the SmartPlanner module to match the newly selected timezone does not yet work.

Notification Settings

In this section of the Settings, the users can adjust which module can send them messages and how they receive those messages.

Each module can send in-app notifications, e-mails, both of these, or no message at all.

The Backend

All these Settings have to be saved somewhere. This is done in the SmartUser and the Settings models. For the notification settings, a second model, the UserSettings, has been created. This makes it easier to add more modules, like the SmartPlanner, later. See the Database section for more detail on these and other models used in the Core.

We wanted to offer a way to search for SmartUni content, for example registered users or institutions, that does not rely on the user entering an exact word, phrase, or academic term. However, the search results should still be relevant to the user’s query. The solution to this problem was to implement a smart search function that calculates similarities and returns search results that are similar to the query.

Considerations

Searchable Content. First, we asked ourselves what exactly users should be able to search for. To answer this, we considered what information a user of SmartUni might want to find. Since one of the main functions of SmartUni is the matching of study buddies, one of the elements we identified as potentially interesting to users were the users themselves, based on their names and academic interests. More information about the questionnaire on academic interests can be found in the natural language processing subsections in the ‘StudyBuddyMatch Architecture’ section (add link).

Another main feature of SmartUni is the event and task planner. We thought that searching for public events could also be of interest, but soon discarded that idea because public events were not implemented in the end.

We identified two more elements as potentially relevant to SmartUni users - institutions and study programs. While we concluded that searching for programs based on similarity to the search query could be helpful, searching for institutions that way wouldn’t make sense because they would be saved as proper nouns. For example, calculating the similarity between ‘Universität Osnabrück’ and ‘Universität Flensburg’ would not help a user who is searching for a specific university.

With all of this in mind, we ultimately decided to implement a smart search for users and programs, and a simple keyword search for institutions.

Smart Search. Next, we asked ourselves what kind of smart search to implement. Searching for users by academic interest and name, and programs by program name, based on a user entered search query, uses textual data. Therefore, we chose natural language processing (NLP) methods.

Because of time constraints and other obstacles further described in the ‘StudyBuddyMatch Architecture’ section, we could not train our models on self-generated, domain-specific data to get more accurate results. This is why we decided to use spaCy, a Python library that comes with pre-trained models for multiple different languages. We chose both the English and the German models because while the questionnaire on academic interests asks for answers in English, the program names entered into our database are mostly in German (with a few in English).

For the academic interest questionnaire, it was possible to give negative answers such as “I don’t like AI,” or “I like literature but not analysis.” Analysis using spaCy embeddings does not take negative expressions of that sort into account but rather focuses on the word count. For example, the sentences “I like analysis and AI but not psychology and neuroscience” and “I like analysis and AI and also psychology and neuroscience” result in a similarity score of 0.98, which is an almost perfect score. On the other hand, the first sentence and “I like analysis and AI” result in a similarity score of 0.91, which is still high but lower than the previous score, when it should be reversed (or at least the same score, since both sentences have two matching topics).

This is why we also decided to include a sentiment analysis. A sentiment analyzer takes a document, in our case a phrase, and determines if the phrase has a positive or a negative polarity. Based on the analysis of tools found on investigate.ai, we settled on nltk’s SentimentIntensityAnalyzer. nltk is the Natural Language Toolkit, a suite of Python libraries for natural language processing. We made this decision because nltk seems to process sentences with negations such as “not” and “but” the best, and our data includes exactly these kinds of sentences.

Data Processing. An important consideration was how to handle the data processing in the code. Using the QuerySets offered by Django was discarded as an option because it is not possible to attach information, in this case similarity scores, to an existing QuerySet. However, this was a necessity because we needed to be able to link the element ID to the specific similarity score. Since this meant we had to handle data tables, we decided to use pandas, a Python library optimized for handling tabular data.

When it comes to processing language data, preprocessing is almost always the first step. Even though spaCy already handles most preprocessing steps automatically, we still had to handle punctuation and stopwords, which is why we implemented a method that removes these.

When testing the AISearch class, we also found that spaCy isn’t able to process certain German compound words, for example “Wirtschaftspsychologie.” This is most likely due to the data used to train the model. For cases like this, we decided to use the HanoverTagger(HanTa on github). The HanoverTagger is a lemmatizer and tagger that we chose to split German compound words and retrieve the word stems. Then, the similarity scores were re-calculated for the sequence of stems.

Finally, since we realized that spaCy takes the length of the text data into account when calculating similarity, and does not produce as accurate similarity scores when longer sentences are involved, we decided to reduce the word count. As a rule, every text containing more than 3 words goes through another processing step that removes all types of words but nouns. This results in a sequence of keywords that are compared to the query for similarity.

Custom Stopwords. Stopwords are words that don’t add meaning to a document. For example, in the sentence “I like AI,” “I” is a stopword. The words “like” and “AI,” on the other hand, help both in the sentiment and the similarity analysis. However, there is no set list of stopwords, only commonly used ones, for example the list of stopwords built into nltk. For our smart search, we needed a list that does not include negations so we could use our sentiment analysis. All lists that we found, however, also included negations. For that reason, we decided to copy nltk’s list, remove the negations, and hard-code a function that returns our custom list of stopwords.

nltk’s stopword list only contains English stopwords, which is why we also searched for a German stopword list. We settled on the list found on Ranks NL because it was simple and manageable. As for the English stopword list, we removed the negations from the German list and hard-coded that custom list as well. Based on the language passed to the AISearch class, the respective stopword list is used.

Another consideration was the list of programs including words that don’t necessarily add meaning. For example, “Biologie - 2-Fächer-Bachelor” and “Deutsch/Germanistik - 2-Fächer-Bachelor” contain the compound word “2-Fächer-Bachelor,” which would result in a higher similarity score, even though the topics are very different. Therefore, we also added our own custom stopwords to calculate more precise similarity scores.

AI Implementation

AISearch Class. We implemented the AISearch class to gather all methods related to the smart search in one place. The class takes five arguments: a pandas DataFrame object for the data that should be filtered for similar records, a string for the user’s search query, a string for the language that should be used for the model, a string for whether additional domain specific stopwords should be added to the existing ones, and a (what type?) for whether each record contains a list of data that should be compared separately.

First, the received variables are processed by the AISearch class. The data that the AISearch class receives should have a column named ‘documents,’ so that no blind assumptions about the structure of the DataFrame have to be made. Upon receiving the query string, it is saved both as a string and as a spaCy NLP object - a so-called Doc object that has been processed by spaCy and can be used to calculate similarity scores. The language model determines which model is loaded by spaCy. Because loading the model is resource-intensive, loading it conditionally is more efficient.

The last argument, whether each record contains a list of data, determines if this document is further split during processing. This is the case for the academic interest questionnaire, where a user enters a list of different interests that need to be compared to the entered query separately. Although it is possible for a user to enter a query consisting of multiple interests, we decided to favor separate similarity scores because we found it more likely that a user will enter just one keyword or a phrase containing just one keyword.

Preprocessing and Similarity Analysis. The data is preprocessed in several steps. First, the documents are filtered for positive sentiment. The sentiment analysis calculates a score between -1 and 1, where negative values denote negative sentiment and positive values denote positive sentiment. We only want to find similar results for queries the user is interested in, which is why documents with negative sentiment are discarded. After filtering the documents for sentiment, punctuation and stopwords are removed. At this stage, the documents are ready for a similarity analysis.

If the user is searching for other users, one more step is added to the preprocessing, namely splitting each document by comma into a list of separate values. Similarity scores for each separate value are then calculated. To find the most similar user, the mean score of all similarity scores is calculated and saved as the final similarity score of the user associated with the document.

If the similarity score is 0.0 and the language is set to German, the HanoverTagger is used to extract the stems from German compound words in order to re-calculate the similarity score based on the sequence of stems.

Following this, records below a cut-off value specified in the AISearch class itself are removed from the results. This final data table is then sorted by descending similarity scores. The cutoff value is arbitrarily chosen as 0.4 because we did not want to restrict the number of results too much, but also didn’t want to return all results.

Smart Search User Interface

Compared to the backend calculations, the user interface is rather simple. The user can enter a search term in the search bar at the top of the page and is then redirected to the results page (Fig. 7).

Fig. 7: Search bar at the top of the page with entered search term "psychology" (left), and search result page (right).

On the initial search result page, the user can now choose which elements to search more thoroughly. Clicking on one of the “Search all […]” buttons redirects the user to a search result page specific to the chosen element (Fig. 8). The results are displayed as a table with multiple pages if there are more than 15 results.

Fig. 8: Search result page specific to the chosen element, here programs.

References

[1] Bootstrap Grid System https://getbootstrap.com/docs/4.0/layout/grid/ [last accessed: 17.09.2022]