Introduction
R-Ladies São Paulo organized, on March 18, 2023, the event “Open Data Analysis with R - Open Data Day.” The activity took place on a Saturday, during the morning and afternoon, with 6 hours of activities.
Insper, a non-profit institution dedicated to teaching and research, once again supported the group by providing the space for the event. Another support received was from Curso-R, which provided two teaching assistants to help out participants. The main objective of this event was to offer information about what open data is and its importance. Also, we aimed to promote the opportunity to do a “hands-on” activity exploring open data and tracking the current scenario of public data access at various levels of government and topics.
What is Open Data Day?
Open Data Day is an annual celebration of open data worldwide, organized and supported by the Open Knowledge Foundation. If you want to know more, check the official Open Data Day website and the project page on the Open Knowledge Brasil website.
Main activity
With about 40 people, we structured the event in four blocks. The first block was instructive and featured a sequence of brief presentations such as: what is the R-Ladies São Paulo community, what is Open Data Day and what is Open Data.
The objective was to make people feel free to work with open data, so we split the class into small working groups. Each group worked on a specific dataset and worked with a Teaching Assistant with experience analyzing that dataset, who guided the group to look into the data and understand them. So, the second block consisted of the following activities:
The explanation of how this activity would happen;
A brief speech (about 5 minutes) by each Teaching Assistant on the topic of their expertise and on the dataset that the group would work on, to facilitate the identification of participants with the subject;
And the separation of participants into groups according to the affinity of interests - everyone could choose the topic with more interest, and there was no need to resize or redistribute groups.
The topics and respective teaching assistants were:
In addition, teaching assistants were available to help with questions about R:
We created a Google Document so the groups could take and share notes.
Then, in the third block, the groups worked on importing, understanding, and starting to explore the open data of their respective themes. As many people pointed out that they had no experience creating data visualizations and were interested in getting started, co-organizer Beatriz Milz gave a short live coding presentation on using the Esquisse package to generate data visualizations with ggplot2.
In the fourth and last block, the groups presented some of their difficulties, lessons learned, and results (some even showed visualizations made!). In addition, interesting presentations reflected the topics addressed in the initial presentations on open data - for example, some groups indicated dealing with an outdated database, others pointed out that data was aggregated and could be made available with a greater level of detail, etc.
The experience was outstanding, and the organizers and participants demonstrated that they enjoyed the activity a lot (especially the children!). Each group had between two to six people in addition to the teaching assistants, which allowed for a more individualized follow-up to answer questions.
Strengthening the community
Two interesting points to highlight are the collaborative coffee and the Gugudadados Space.
Collaborative coffee
We made a collaborative coffee with items purchased by the organizing group (with the money received from the scholarship offered by OKBR) and food brought by participants. This way, people could get up and get coffee and something to eat at any time during the event. This coffee format (available the entire time of the event) is very good for three reasons: (i) it respects the time of the groups who can take their breaks as the work progresses; (ii) it welcomes participants who, for health reasons, cannot go without eating for many hours and (iii) welcomes participants who, due to socio-economic conditions, are unable to have a meal during their lunch break. Given the nature of the R-Ladies group, it is an important aspect to provide a welcoming environment so that everyone can enjoy the experience of the event regardless of having something to eat throughout the day, in addition to the fact that the collaborative coffee is also a way to encourage integration between people!
Gugudadados Space
We created the Gugudadados Space to facilitate the participation of people responsible for kids and babies (e.g., mothers, fathers, and caregivers). A baby and four children between 7 and 10 years old participated in this event. With the money from the scholarship offered by OKBR, it was possible to hire a recreational teacher in the Gugudadados Space (a room next to the event room, on the same floor) throughout the activity. The R-Ladies organizers also took toys, drawings, markers, games, and temporary tattoos to entertain and amuse the children.
Results from the groups during the event
Electoral data (T.A. Cecília do Lago):
This group explored electoral data for the 2022 elections, made available by the TSE. The group wanted to find out how many candidates did not receive one or zero votes in the 2022 election.
The data used can be obtained from the TSE’s Open Data Portal.
Environmental data on Fires (T.A. Bianca Muniz):
This group explored data on fires in INPE’s BDQueimadas system. This system allows anyone to download data for up to one year. The T.A. prepared apresentation about how to download and import this dataset. The group exported data from 2020 to 2022, and the graph below shows the number of fires per month according to the biome where the fires occurred. It is possible to see in the graph that the biomes with the biggest amount of fires are the Amazon and the Cerrado, and they show seasonal patterns. For example, the highest peak of fires in the Amazon, from 2020 to 2022, was the second semester of 2022.
The data used can be obtained from the INPE - BD QUEIMADAS website.
Prison data (T.A. Thandara Santos):
This group explored data from SISDEPEN - Statistical Data of the Brazilian Penitentiary System. These data come from the Prison Information Form, answered electronically every six months by government employees.
One of the difficulties presented by the group is the availability of data to be aggregated by prison unit rather than by individuals, which makes it impossible to do several fundamental analyses on the prison population in Brazil. Another area for improvement is the lack of standardization of the answers presented in the database, which implies low data reliability.
The data can be obtained from the National Secretariat for Penal Policies website.
Violence data (T.A. Fernanda Peres):
This group explored public data on violence using data from SINAN - Information System for Aggravation of Notifications, filtering occurrences involving only adults and removing self-inflicted violence (for example, suicide). The group found that the public data on Violence in DataSUS were outdated. The dataset for 2020 was incomplete, so the group explored data for 2019. The group generated a series of graphs, such as the one below, showing that the largest number of victims are woman. In addition, it is noteworthy that the author of the aggression is most often male (whether the victim is a woman or a man).
Another thing pointed out by the group is that when the victims were women, the aggressor tended to be someone they knew, such as a spouse, ex-spouse, boyfriend, or ex-boyfriend. On the other hand, among men, the most common aggressor is a stranger.
This data can be downloaded from DATASUS. The T.A. gave a presentation on how to get this data.
Employment data group (T.A. Ana Paula):
The T.A. gave apresentation on how to import this data. First, the group explored two databases from Caged (General Register of Employed and Unemployed) from the Central Bank: the number of total jobs from 2000 to 2023; and the number of jobs in the manufacturing industry (any raw material that is processed) from 2000 to 2023. The data can be obtained with the GetBCBData package, making it possible to search for updated data aggregated by month/year and time series I.D.
Rows: 37
Columns: 3
$ ref.date <date> 2020-01-01, 2020-02-01, 2020-03-01, 2020-04-01, 2020…
$ NCaged <dbl> 37938640, 38155900, 37860843, 36879150, 36480704, 364…
$ NCaged_IndTransf <dbl> 6924768, 6961283, 6918309, 6705220, 6600422, 6593132,…
To learn more about this database, consult the Central Bank of Brazil Time Series Management System website.
Difficulties
The main difficulty in organizing the event was the short time available for publicizing the activity, since the date and place were defined only six days in advance. Despite the available room having a capacity for 100 people, only 40 had enough time to organize themselves and register as participants.
However, the presence of more than 40 people, including people from the organization, was enough to carry out the activity, with the presence of people really interested in the topic.
Support
It is important to emphasize the importance of OKBR ’s financial support, which enabled the purchase of items for the coffee break, stickers, and hiring a recreator.
The rooms offered by Insper were crucial for the event to take place. The building is easily accessible by public transport. The meeting took place in a large space with internet access, tables, comfortable chairs, and easy access to a restaurant for lunch. The toboggan that is part of the building’s facilities is also a success and makes up one of the most joyful experiences for children who stay at Gugudadados.
Curso-R also supported the event, providing two R teachers to assist in the activity and being available to help with questions from the participants.
Team
This event was only possible with the collaboration of several people. Therefore, here is a list of people who participated in the various stages of organizing the event:
The event would not be the same without your collaboration - we appreciate and greatly appreciate your participation!
In addition, we also thank everyone who participated!
Next events
This was the first time we held an event with the idea of working in “groups”, and it is certainly a format that worked well (the participants indicated that they preferred this way to expository lectures). We intend to organize other events in this format!
The next R-Ladies São Paulo event is scheduled for May, with a theme yet to be defined. If you are interested in participating, we recommend following our social media!