š Student Performance Dataset
Dataset of results of candidates who sat for GCE AL (University Entrance Examination) containing more than 300,000 records
š Kaggle Link
This dataset contains information on the performance of students in the GCE Advanced Level (AL) exam in Sri Lanka in 2020. It was collected by Sasika Amarasinghe and is available on Kaggle.
I have removed some columns of the original dataset due to ethical reasons. But hereās a sample of the data when a search query is given.
When a name of a school candidate is given, all the details including the birthday (which is not originally published on the exam result sheet) are given. (About pretty much anyone from 2020 AL batch š)
Dataset Characteristics
- The dataset consists of over 300,000 records of student performance in the GCE AL exam in Sri Lanka.
- The data includes information on student identification, school, district, medium of instruction, stream, and their scores in different subjects.
- The data also includes the overall Z-score of each student, which is a standard score that indicates the number of standard deviations by which the studentās exam results are above or below the mean.
Variables
-
Index
: A unique identifier for each student -
School ID
: Identification number of the school -
District
: District where the school is located -
Stream
: Science, Arts, or Commerce stream of the student -
Medium
: Sinhala or English medium of instruction -
Subjects
: The scores of the student in each of the subjects - Mathematics, Science, English, Buddhism, and History -
Z-Score
: The overall Z-score of the student
Use Cases
- This dataset can be used to study the performance of students in different subjects and in different streams, medium of instruction, and districts.
- The data can also be used to study the relationship between student performance and demographic factors such as medium of instruction and district.
- This dataset can be used to identify the factors that contribute to the performance of students in the GCE AL exam and to make recommendations for improving student performance in the future.
Usability
- 9.41 / 10
Sources
- Data were collected from (https://www.doenets.lk/examresults) which is the exam result site in Sri Lanka
Collection Methodology
- Data were scraped data by a python script written by the author using the index no as the key.
- Later the national identity card numbers were decoded to extract applicantsā birthdays and gender.
- For privacy concerns, āFull nameā,āNational Identity Card noā and āIndex noā were removed, but the birthdays and genders have been added to the dataset.
- Used AWS instances to collect data parallely to reduce the time and data usage.
I got a bronze
š„ medal for this dataset with 36 upvotes in the Kaggle Community. and very good feedback from the Community Members.
Testimonials for the dataset
ā This can be actually used to look after the academic likelihoods and whereabouts of Sri Lankan studentsā academics! Great job! ā VISHESH THAKUR - Datasets Expert
ā This data could be used for EDA, visualization and even model development! Good work and great dataset! ā RAVI RAMAKRISHNAN-Notebooks Grandmaster
Github Link
I havenāt publicized the code for the datascraping and datapreprocessing,search queries of students. Not available due to ethical reasons.