The first experimental session in Amman, which included 10 teachers

The first version of the “Reading Wikipedia in the Classroom” program was successfully implemented in Jordan on May 25, 2024. Following its launch, the program’s development gained momentum through collaborative partnerships. The initiative was initially proposed by Mohammed Hajeer during a Wikimedia Levant meeting on the 19th Arabic Wikipedia Day. Mervat Salman emerged as a key supporter, driving its implementation forward. To facilitate the program, I translated the teacher’s guide into Arabic, ensuring it met the needs of Arabic-speaking educators and updated previous versions into a standard Arabic format. Financial and logistical support from the Wikimedia Levant group further enabled the program’s launch. 

Initial Challenges and Progress 

The program faced significant hurdles, particularly in gaining access to teachers and convincing them to participate. While 25 teachers initially registered, only 10 actively participated, representing three different governorates in Jordan. Despite the modest attendance, encouragement from the Wikimedia Levant group and individuals like Melissa Guadalupe—who remarked, “It’s certainly commendable to start small and continue the growth from the learnings that proximity to teachers bring”—kept the momentum alive. 

The participants’ reactions during the honoring ceremony on the 20th Arabic Wikipedia Day were particularly inspiring. Teachers expressed enthusiasm for contributing more to Wikipedia, even creating articles to mark the occasion. One teacher, in particular, displayed a keen interest in expanding the program to her school, Al-Durra Al-Sharifa School in Amman. With further support from Wikimedia Levant, preparations for a second session at this school were initiated. This event, scheduled for November 30, 2024, demonstrates the rapid progress between the first and second sessions. Additionally, I suggested engaging Miss Salam alSarayrah as a trainer for a program segment, aiming to strengthen the team and ensure continuity. 

Closing ceremony during the 20th Wikipedia Day, featuring participating teachers and veteran Wikipedians.

Scaling Up: Government Support and Future Goals

While planning the second session, I received a significant endorsement from the Ministry of Education in Jordan, where I am employed. They approved the program’s expansion into schools in Zarqa city, marking a pivotal moment in the program’s development. This governmental recognition not only validates the program but also paves the way for broader implementation. 

In just under six months, the program has transformed from a small pilot into an initiative with national potential. This success is not solely due to individual efforts but to the collective belief and support from a dedicated community. Our vision is to institutionalize the program within the Ministry of Education, expand it to additional regions in Jordan, and empower teachers to become program ambassadors in their schools. 

The journey has just begun, but the foundation has been laid for sustainable growth and lasting educational impact. 

I am Emmanuelle Guebo (username : Manouka), wikimedian and general coordinator of Wikimedia Côte d’Ivoire. After Agadir (Morocco) in 2023, I took part in Wiki Indaba for the 2nd year running, with the 2024 edition (4 – 6 October), in Johannesburg (South Africa), by means of the Code for Africa (CfA) grant. 

Like most Wikipedians in Africa, I applied for a Wikimedia Foundation (WMF) grant to take part in WikiIndaba 2024. After this stage, I received a hopeful reply informing me that I was on a waiting list, only to be turned down, much to my dismay. 

Maryana Iskander, Dominique Yao and Emmanuelle Guebo at WikiIndaba 2024_ CC BY-SA 4.0 – Yasield 

WMF grant not obtained, CfA to the rescue

I was beginning to resign myself to following the opening of Wiki Indaba 2024 from afar, via the broadcast on Eventyay, when I received a call from Bukola James, the Code for Africa representative. It was a great surprise! She informed me that the African Wikipedian Alliance (AWA) would like to sponsor my participation in Wiki Indaba 2024 to encourage my contribution and support in facilitating the French-speaking community. 

It is with honor and eagerness that I have accepted the CfA scholarship. This is to carry the torch of AWA high, of which I am an ambassador for the French-speaking community, and to contribute to the exchanges during Wiki Indaba 2024 on the theme ‘‘United in Diversity’’.

Code for Africa team at WikiIndaba 2024 _CC BY-SA 4.0 – Yasield

Participation and activities at Wiki Indaba 2024

My presence in Johannesburg at this edition of Wiki Indaba was an opportunity to take part in several enriching sessions, to meet Wikipedians from different backgrounds and, above all, to share good practices with them, while discovering an environment marked by the presence of South African culture.

On behalf of the African Wikipedian Alliance, I took part in the various sessions run by Code for Africa, while at the same time encouraging members of my community on our local channels to take part in the online sessions run by Code for Africa via the Eventyay platform. These sessions were also an opportunity for me to provide logistical support to the CfA team: setting up and maintaining the CfA kakemono, providing the Wiki Indaba 2024 souvenir stand and distributing CfA gadgets to participants.

I also took part as a speaker in the session led by the DRC user group on ‘‘How to implement a capacity building programme for your community: lessons learned from WM DR Congo’’. I shared my experiences as a mentor and trainer during the 2024 campaign in the DRC.

Photo de famille de la délégation de WikiCiv à Johannesburg pour la WikiIndaba 2024

As a wikimedian from Côte d’Ivoire, I also contributed to the enrichment of exchanges at Wiki Indaba 2024, by being a panellist at the session on October 4th, 2024 on the theme of ‘‘Managing Wikimedia User Groups’’ with the contribution of our Wikimedian friends from Morocco.

In my capacity as general coordinator of Wikimedia Côte d’Ivoire, I also worked on organizing the participation of two Ivorian wikipedians at Wiki Indaba 2024. Their expenses were fully covered by the Côte d’Ivoire user group, and my presence at Wiki Indaba helped to finalize the coordination work that had been underway since the beginning of July to ensure the effective participation of the Ivorian user group grant holders.

Emmanuelle and CfA, a growing partnership

My story with Code for Africa began when I met Bukola James at Wiki Indaba 2023 in Agadir. At that first meeting, Bukola enthusiastically explained to me the work being carried out by the Wikimedia movement in Africa and the need to find French-speaking people who could accompany them on this adventure. 

The idea immediately appealed to me, because as long as it’s about sharing and contributing to the development of communities, I’m fully committed to the good of the community. I told her that I was willing to make a contribution and we kept in touch. 

A month later, I was contacted by Code for Africa to be a panelist at the online meeting on December 6th, 2023 on the theme of ‘‘Establishing better collaboration between African Wikipedians’’. 

In March 2024, I was also invited to join the EmpowerHer 2024 Editathon campaign, from March 22nd to April 23rd, 2024, as a Wikipedia trainer. It was an exciting time of sharing knowledge with Wikipedians from different countries and contributing to the production of content on the theme of women in democracy, elections and governance in African Union (AU) countries.

In July 2024, in order to continue to contribute, while supporting AWA and the French-speaking members of the movement, I joined AWA, which is the network of African wikipedians who are agents of change and who work together to strengthen the integrity of information.

Family photo after Wiki Women Lunch at WikiIndaba 2024_  CC BY-SA 4.0 – Yasield 

Taking part in Wiki Indaba 2024 in Johannesburg as an AWA ambassador was an opportunity for me to understand the involvement and scope of the work I do and the CfA scholarship is recognition of my voluntary work and commitment.

I’d like to take this opportunity to thank the entire Code for Africa team, especially Amanda Strydom – Senior Programme Manager – whom I was delighted to meet at Wiki Indaba 2024, and the dynamic Bukola James – whom I met again in Johannesburg (South Africa). I would also like to express my gratitude to my user group, whose chairman is committed to working with other communities for the benefit of all.

Emmanuelle Guebo (u:Manouka)

From Challenges to Change-Making: Celebrating Chabota

Thursday, 21 November 2024 19:26 UTC

We often celebrate Wikipedia, and how it deepens our understanding of the world through sharing facts, lists (and lists of lists of lists!), articles and information about cultures, people, inventions and events from all over the world. But behind every article lies the dedication of individuals who shape and share the stories of their communities. One such individual is Chabota Isaac Kanguya (User:Icem4k), a passionate contributor from Zambia, whose journey through the Wikimedia movement embodies resilience, collaboration, and a commitment to representing underrepresented voices. Today, thogether with Africa Baraza, we Wikicelebrate Chabota and his contributions to the movement. 

The rocky beginnings of a resilient Wikimedian

While some users integrate into the Wikimedia community seamlessly, and grasp the project’s policies and goal from their very first edits, Chabota’s first steps were somewhat challenging. “My journey as a Wikimedian began on November 7, 2011, while I was a high school student at Linda High School in Livingstone, Zambia. I started contributing to Wikimedia out of necessity for a research assignment that required information from Wikipedia. Fascinated by the platform’s collaborative nature and extensive knowledge base, I created my account and joined the global community” Chabota recalls. But enthusiasm alone wasn’t enough. Early attempts to upload content, including personal photos and an autobiographical article, were quickly deleted due to a lack of what Wikipedians value deeply – verified sources. Worse still, a copyright misunderstanding led to a lifetime ban on Wikimedia Commons. “I treated Wikipedia like any other website,” Chabota admits. 

This could be an end to his Wikimedia journey, but Chabota was so fascinated by Wikimedia, and the possibilities it gave to share knowledge and awareness about Zambian culture, that he was determined not to give up. He spent four years immersing himself in learning the dynamics of Wikimedia projects, carefully studying the guidelines, policies, and best practices that govern Wikipedia and its sister projects. This deliberate pause allowed him to gain a deeper understanding of the “dos and don’ts,” equipping him to navigate the platform more effectively. Once confident in his knowledge, Chabota decided to focus on writing articles about places in Zambia, ensuring they were well-sourced and aligned with Wikipedia’s standards. His first official contribution after the ban was an article about David Kaunda Technical High School. Seeing this article successfully published boosted his confidence and inspired him to revisit biographies—an area that had challenged him before. 

Now he is a contributor with more than 27 000 edits in Wikimedia projects, including 11 000 edits in English language Wikipedia, and over 2400 edits in Wikipedia in Chewa language

When reflecting on his initial challenges Chabota adds one, very practical one – access to equipment in times where mobile editing was more challenging than it is now. “The funny thing about this whole journey is that I was using a very small, pocket smartphone for the first three articles. I know it sounds impossible, but I managed to get the most out of it,” he says. “My father even had to get me a bigger phone when he saw my efforts and that pushed me all the way to completing seven articles until I finally got my first laptop in 2018 through the Wikimedia  hardware program.”

Elevating African narratives

From the start Chabota was strongly motivated by the Wikimedia mission of sharing free knowledge with the world. Especially when it comes to sharing knowledge about his local community and culture. Knowing that my efforts help bridge information gaps and ensure that Zambian voices are represented globally fuels my passion. I am also inspired by the reminder someone once shared with me: No one will write your story better unless you do it yourself. This drives me to take ownership of our narratives and represent them authentically, he says. 

Chabota knows that while the Wikipedia knowledge repository is huge, there is still so much more to do. “Despite Wikimedia’s global reach, many regions, particularly in the Global South, remain underrepresented in terms of both content and contributors. This means that the knowledge shared on platforms like Wikipedia often lacks the perspectives, histories, and cultures of these regions. For example, in Zambia and other African countries, there is still a need to increase local participation and create content that accurately reflects our diverse cultures, languages, and histories” – he reflects. 

He takes this mission to action. He himself creates content about Zambia, among his many articles, he is especially proud of his contributions to the article about B’Flow – a Zambian singer and songwriter known for his impactful music and activism. 

Addressing challenges through collaboration

When asked about the challenges that Wikimedia projects face, he says that addressing them “requires ongoing efforts to recruit and train new contributors, improve access to technology, and ensure that our platforms are more inclusive and reflective of global knowledge”. He funded Wikimedia Community User Group Zambia, and participates in initiatives such as training new editors, organizing events, and supporting content creation that highlights Zambian and African narratives. 

He is also looking forward to the Wikimedia Zambia Fan Clubs, a pilot project in schools to engage students and promote participation in Wikimedia initiatives which is set to start in 2025.

Belonging and connection 

a yellow barnstar leaving a yellow trail as it flys through a white background. The larger barnstar is surrounded by smaller stars

In 2018 Chabota participated in his first Wikimania. This beloved global event, connecting Wikimedians from around the world in a joint celebration of a shared mission, did its usual magic: it showed Chabota that he is part of a huge community, it brought new friendships, connections, and most of all: fueled his sense of belonging. In his own words: this changed everything and fueled his motivation to do more. 

The connection to a collaborative movement is what keeps Chabota going: “The support and collaboration within the Wikimedia community further inspire me—it’s motivating to be part of a collective mission where each contribution plays a role in empowering people with accessible, free knowledge”, he says. 

Thank you Chabota for all that you are doing, we are celebrating you and your dedication to the movement!

Visualizing Wikibase connections, using wikibase.world

Thursday, 21 November 2024 16:11 UTC

Over the past week I have spent some time writing some code to start running a little bot on the wikibase.world project, aimed at expanding the number of Wikibases that are collected there, and automating collection of some of the data that can easily be automated.

Over the past week, the bot has imported 650 Wikibase installs that increases the total to 784, and active to 755.

I mainly wanted to do this to try and visualize “federation” or rather, links between Wikibases that are currently occurring, hence creating P55 (links to Wikibase) and P56 (linked from Wikibase).

251 Wikibases seem to link to each other, and Wikidata is very clearly at the centre of that web.

Many Wikibases only link to Wikidata, but there are a few other notable clusters, including Wikimedia Commons (but see the improvements section below, as some of these may be false positives).

I’m not sure why Q2 didn’t render the label, but Q2 is Commons in the below image.

Others such as LexBib, MaRDi portal, PersonalData.io, Librarybase, R74n and more also seem to have multiple connections (more than one)

Here is a fairly nice SPARQL query that can get you these links in their current state, in a table…

PREFIX wwdt: <https://wikibase.world/prop/direct/>
PREFIX wwd: <https://wikibase.world/entity/>

SELECT ?wikibase ?wikibaseLabel ?linksTo ?linksToLabel
WHERE {
    ?wikibase wwdt:P3 wwd:Q10.
    ?wikibase wwdt:P13 wwd:Q54.
    ?wikibase wwdt:P55 ?linksTo
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}   

Runnable here: https://tinyurl.com/28dor4qe

The scripts

Very briefly, there are a collection of scripts that import Wikibases found via a variety of methods (I’m open to new ideas if you have them).

  • wikibase.cloud: which exposes an API of all currently active installations
  • wikibase-metadata.toolforge.org: which as some data collected about usage of “Wikibase Suite” installed elsewhere
  • google: with some painfully long, crafted search terms that match the few things identifying a Wikibase that might get indexed.

These scripts import a very bare-bones version of an Item, such as [1], [2], [3]…

Once the data is in wikibase.world, a separate process loads all currently active Wikibases, and tries to add and refine information.

  • Load the site and see if it is a 200
  • Try to normalize the URLs a bit if possible
  • Try to detect and record the host
  • Add an inception date, based on the first logged action by MediaWiki
  • Add entity types and tools used (sometimes)… (extensions to come soon?)
  • Add links to and from other Wikibases based on some External Identifiers, and all URL properties.

The code makes use of wikibase-edit and wikibase-sdk written by maxlath. They were a pleasure to use, really simplify Wikibase APIs down to basics, which is all I needed here.

Improvements

There are many other elements of data that could be added, and that also would be nice to be able to filter by across all Wikibases, such as number of entities, number of users, date of first Wikibase edit etc. I plan on slowly trying to tackle these parts moving forward.

There are also possibly a few issues with the current process

  • Not all External Identifier properties are currently inspected. Only those that have a formatter URL property defined, and also that have that formatter URL property exposed via WikibaseManifest (so the WikibaseManifest extension is also a requirement)
  • All URLs are inspected for known domains, and these may link to NON Wikibase and NON entity pages. Such as a URL that just links to https://commons.wikimedia.org would currently appear as a link…

Currently, I have just been running the scripts locally, but I’ll aim to set them up on GitHub Actions so they run weekly perhaps?

And let’s pretend that I wrote the code in a nice tidy way, haha, naaah

That will come (if this all still seems like a good idea)

Wikipedia:Administrators' newsletter/2024/12

Thursday, 21 November 2024 12:16 UTC

News and updates for administrators from the past month (November 2024).

Administrator changes

added ·
readded
removed

Interface administrator changes

added
readded Pppery

CheckUser changes

readded

Guideline and policy news

Technical news

Arbitration

Miscellaneous


Archives
2017: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2018: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2019: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2020: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2021: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2022: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2023: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2024: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11


<<  Previous Archive    —    Current Archive    —    Next Archive  >>

Reflecting on Wiki Indaba Conference 2024

Thursday, 21 November 2024 11:00 UTC

Themed “!ke e: /xarra //ke,” which means “United in diversity” the 2024 Wiki Indaba Conference saw the inflow of diverse African Wikimedians from different African regions trooping into the O. R. Tambo International Airport on the wake of 1st and 2nd October 2024 and I was one of those African Wikimedians flying in from Nigeria thanks to the support and sponsorship of the Wiki Loves Monuments International Team. Though I embarked on this journey ‘solo’, before touching down in the beautiful city of Johannesburg, I was already among the many vibrant Wikimedians attending the Wiki Indaba 2024 – one cannot overlook our energetic demeanour even after long hours in the air.

Fast-forward to Arrival

Arriving and checking in at the conference venue – The Capital on the Park – first helped me recover and rejuvenate from the long hours of travel before convening for a mini welcome party with a dinner of finger food. The African spirit and our diversity was reawakened by this welcome party and dinner, giving us the opportunity to socialize, interact and spark up meaningful conversation that would sustain the tempo all through the conference days. During the welcoming party, I had the opportunity to network, interact and engage with fellow Wikimedians I have been working with online. It was a moment of reconnection and reintroduction, matching the online faces to the onsite faces. Our convergence was an avenue for cultural exchange and social events, enriching my experience, and highlighting the diversity and richness of African cultures as I got to taste lots of South African delicacies and chops – I teased my taste buds with variant delicacies.

Wiki Indaba Conference Proper

The conference commenced with a Welcome Ceremony facilitated by Bobby Shabangu and later featured a keynote address from Professor Langa Khumalo (SADiLaR) presenting on The importance of online Indigenous Language Preservation. As someone passionate about indigenous language advocacy, this sparked lots of interest and reactions within me, equipping me with lots of vital information and findings that are beneficial to my indigenous language advocacy goal. Also, the plenary session on Conversation: Information integrity and the role of the Media with Maryana Iskander (WMF) Anusha Alikhan (WMF), Ferial Haffejee (Daily Maverick) ushered in the various sessions of the conference of which I participated in both as a presenter, co-presenter, panelist and participants.

One Conference, Different Roles!

The Wiki Indaba conference program was fully robust with Lightning talks, Lectures, Workshops, Panels, Roundtables and even Hackathon sessions, all of them focused on bridging African content gaps, digital literacy initiatives, and policy advocacy for open access – demonstrating our strength even in our diversity. The two-time Wiki Indaba Participant me embraced each of these sessions with open mindedness, ensuring that I am engaged with the free knowledge knowledge shared.

Hacking at Wiki Indaba

Co-facilitating with Purity during the Language sessions at Wiki Indaba Hackathon | CC BY 4.0, PWaigi-WMF

My first session as a co-presenter was a Hackathon session with Purity. Before my Hackathon experience at Wiki Indaba 2024, I have always had this notion that hackathon has to do with coding and writing programming languages ONLY. Onboarding as a co-facilitator with Purity for Hackathon sessions on Translatewiki, Content translation and Section Translation helped me rewrite the wrong narrative and I was equipped with lots of knowledge and skills on these topics. During these hackathon sessions, we got to engage participants on the optimization of tools such as Translatewiki, content translation and section translation tools to carry out translation tasks on language Wikipedias. As someone passionate about indigenous language digitization, I was happy engaging and equipping participants with the technical know-how of these tools so to ensure proper optimization in their various translation projects within their language communities. Co-facilitating the Language sessions at Wiki Indaba Hackathon gave me this sense of fulfilment as I see it as a means through which I can continue my indigenous language advocacy goals.

Wiki Loves Monuments Lightening Talk

As a member of the International Wiki Loves Monument Team I had the opportunity to facilitate a Lightening Talk on the Dynamics of Organizing WLM in the African Region which I was meant to present with Ciell. During this 15-minutes session, I got to learn the challenges African Wikimedians face in participating in the Global WLM Campaign. Some of the problems mentioned were generic such as the issue of Freedom of Panorama, low organizing skills, lack of funding etc. while some were relative – internal management. Some were short-term challenges while some were long-term challenges. However, this session offered an incredible opportunity to learn about their difficulties and to plan for possible means of mitigating them to ensure their active participation for the Wiki Loves Monuments 2025.

A Panelist on the Afrika Agenda Session

I represented the Afrika Baraza Working Group Team, in the panel session on the discussion on The Afrika Agenda: Fostering Effective Collaboration & Inclusion for African Wikimedians where I took part in discussing some of the challenges the African Wikimedians face in the Wikimedia space. This session emphasizes the need for African Wikimedians to take a stance on the table where administrative and governance topics are being discussed to be able to their stories themselves.

De-Conferencing

The social activities are the elements that help ensure the attainment of the Wiki Indaba climaxing – I always look forward to it! Some social activities include embarking on a tour of the famous Soweto streets that once housed two great African figures – Nelson Mandela and Desmond Tutu. We explored the street of Soweto and made a stop-over at Nelson Mandela’s house. It was also all fun cutting the Wiki Indaba 10th Anniversary cake. Wiki Indaba 2024 reinforced that even in our #Diversity, we are strongly #United! Wiki Indaba 2025 would usher in lots of possibilities – !ke e: /xarra //ke!

In WikiAfrica Hour’s 39th Episode, aired live on 18th October  2024. The episode was dedicated to celebrating the 2024 Wiki Loves Africa International media contest winners. Guest host Ceslause Ogbonnaya engaged in a lively conversation with the ambassadors, local organizers, international jury members and the winners of the Wiki Loves Africa International media contest 2024. 

In the moments preceding the conversation, Nonny and Donia shared the WikiAfrica Hour News. Then it was time for some serious conversation. After an eventful and successful Wiki Loves Africa international media contest where we recorded 14,000+ Images, videos and audio submissions under the theme Africa Creates, we met the organizers at the local level, the jury that selected the final winners, and the winners themselves! The Wiki Loves Africa 2024 international prize winners shared their creative journeys that led up to the win.

“It was my first time to be the winner internationally. And I think this achievement makes me even more passionate about my career in photography.” – Mutijima Germain from Rwanda (Joint 2nd photography prize winner).

“When I won the wiki of Africa Best Audio story last year, Mr. Wilson pushed me to learn it further. And I can see that sound audio storytelling is growing as a medium to tell better stories.” – Abdullahi Tanimu Abubakar from Nigeria (Best Audio prize winner).

Watch the full episode on YouTube

This WikiAfrica Hour episode’s guest host is Ceslause Ogbonnaya, who has been a Wikimedian since 2018 and has been contributing to Wikimedia projects under the Igbo Wikimedians User Group. 

He joined Wiki In Africa as an intern in 2021, and later transitioned into Host & Project manager for WikiAfrica Hour. In 2022, he joined Africa No Filter as Wikimedian-In-Residence for Africa Knowledge Initiative (AKI). He currently is the Wikimedian-In-Residence for AfroCreatives WikiProject, working to document the African creative industry on Wikipedia and its sister projects

member of the Afrika Baraza Working Group.

Our guests include  …

Hilary Ogali  

Wiki Loves Africa 2024 ambassador. 

Hilary is a Wikimedian from Nigeria who joined the Wikimedia movement in 2022 through the Igbo Wikimedians User Group. He joined the Wiki Loves Africa international team as an ambassador in 2023 where he supported the english communities to achieve a successful Wiki Loves Africa 2024 campaign under the theme “Africa Creates”.

Eric Chidom 

Wiki Loves Africa 2024 jury member.

Eric Chidom is a documentary photographer and filmmaker based in Abuja, Nigeria, with over four years of experience. He focuses on documenting culture in South-Eastern Nigeria and is a partner and Head of Productions at Ònye Ijé Productions, which specializes in research-based documentaries. He is also the Lead Creative at Dorick Productions, a media company excelling in event coverage, corporate videos, and commercial productions.

Eric has worked with international teams to produce documentaries in Nigeria, bringing a global perspective to local stories

FUMEY Adjovi Essenam

Wiki Loves Africa 2024 organizer, Wikimedia Senegal.

Adjovi Essenam Fumey is an engineer in Documentary Information Sciences. She works as a media librarian and is involved in the Wikimedia movement within the Senegalese community, which she co-founded. Since 2020, she has contributed to various Wikimedia projects.

Adjovi is passionate about sharing free knowledge, addressing the gender gap, and promoting community development, motivations that drive her to contribute and encourage others to do the same.

Kano TCHAPI 

Wiki Loves Africa 2024 organizer, Wikimedia community user group Republic of Congo (Congo Brazzaville)

Kano Tchapi is a professional passionate about technology and committed to the development of developers in the Republic of Congo. A mechatronics engineer and graphic designer by training, he has solid experience in community and event organization.

Founder and active facilitator of Wikimedia User Group of the Republic of Congo, he has dedicated his time to promoting the sharing of free knowledge and creating a collaborative space for Congolese developers. His technical expertise and leadership have helped to unite a community around technological innovation.

Demmane Mohammed Fawzi 

WLA 2024 1st photography prize winner

Demmane Mohammed Fawzi  has been an Algerian amateur photographer since 2000. He has participated in many exhibitions in Algeria and Tunisia as a photographer.

Doaa Adel Ahmed Mohamed 

WLA 2024 Joint 2nd photography prize winner

Doaa Adel is an Egyptian photographer working in street and documentary photography for the past five years, aiming to document daily life in Egypt. He doesn’t capture people only; he also captures details that come together to form a compelling narrative.

Doaa’s notable awards include:

First place in the Al-Nadeem Foundation Photography Competition, second place in the Turthy Competition, second place in the UN Women Egypt Documentary Competition, third place in the photo story section for the best press photo in Egypt (2023), first place in the Smile Competition sponsored by Sync, and

first place in the Egyptian Culture Discovery Competition sponsored by OPPO Egypt.

Doaa has participated in numerous local and international photographic exhibitions. His work has been published in prominent outlets, including Getty Images, AFP, AP, Al-Ahram newspaper, Akher Sa’a magazine, Sabah Alkhayr magazine, and Al-Shorouk newspaper.

Germain Mutijima 

WLA 2024 Joint 2nd photography prize winner

Germain Mutijima is a dedicated Wikimedian and community trainer from Rwanda. Since 2020, he has contributed to Wikimedia projects, empowering his community through training on Wikipedia, Wikimedia Commons, and Wikidata.

He has attended Wikimedia conferences like Wiki Indaba and Wikimania 2024, broadening his perspective and connecting with fellow Wikimedians. Through his work, he bridges the knowledge gap, fosters online communities, and inspires others to join the Wikimedia movement.

Mohamed Nageeb Nasr 

WLA 2024 3rd photography prize winner

Egyptian Amateur photographer  with experience in Travel& Street  photography since 2012,Holds EFIAP ,  GPU Crown 2  distinctions, Winner of many international photography competitions and got more than 60 awards in photography , presenter in several workshops on street and travel photography,Member of the International Federation of Photography FIAP & the Global Photographers Union GPU, Main education : BA in Media and  Journalism , based in Qatar

Saalyha Eeman Oyiza 

WLA 2024 Best narrative video prize.

Eeman Oyiza is a Nigerian and African Muslim. Growing up, she accompanied my father, a media professional, to his studio and shoots, sparking her passion for photography and videography. At 12, she landed her first paid gig and received her first camera from my father.

As she studies Geography and Environmental Management at Ahmadu Bello University, Zaria in Nigeria, photography remains her cherished hobby. With a growing portfolio and occasional job offers, she is now considering taking my skills to the next level.

Green Wilfred Somoni 

WLA 2024 reportage video prize winner

Green Wilfred Somoni, also known as Somoni, is a filmmaker from Bonny, Nigeria.

He has worked on several film, documentary, event, and commercial projects. On 12th April 2022, he filmed his first short film Dream Home with which he entered as a participant in the Commons: Wiki Loves Africa 2022 photographic Contest.

The video was later announced as the best video of the 2022 Wiki Loves Africa Photographic contest. He also won the best video prize for Wiki Loves Africa 2023 and Wiki Loves Africa 2024 best video prize made it the third time he has won in a row.

Abdullahi Tanimu Abubakar 

WLA 2024 Best Audio prize winner 

Abdullahi T.A is a Photojournalist and Documentary Filmmaker, originally from Sheshe in Kano, Nigeria but he was raised in the slums of Lagos, specifically Idiaraba. His multicultural upbringing informs his community-driven approach to storytelling. He’s received awards and recognition for his work, including Wiki Loves Africa 2023 Best Audio Prize.

Abdullahi collaborates with organizations to tell impactful stories, believing authentic narratives emerge from community engagement. A student at Dikan Art Center and alumnus of Bayero University Kano, he harnesses storytelling to enact change, challenge single narratives, and educate others.

This WikiAfrica Hour episode aired live: check out our Website or the WikiAfrica Hour meta page to watch this or previous episodes.  All WikiAfrica Hour episodes are available on YouTube.

Episode notes:

How Czech seniors enriched Wikipedia in 2023

Thursday, 21 November 2024 07:00 UTC

Seniors belong on Wikipedia! This has been confirmed for many years by the successful Seniors Write Wikipedia programme in Czech Republic. We welcomed new active seniors to Wikipedia, introduced several new features and had many successes.

Courses as a shop window

As every year, the senior activities in 2023 revolved primarily around the flagship of the entire Seniors Write Wikipedia programme, i.e. courses for seniors. The time-tested six-week scenario of courses for seniors-beginners took Wikipedia to Prague, Ostrava, Plzeň, České Budějovice, Olomouc and through online courses to other parts of the country. As every year, the graduates of the beginners’ courses could further improve their skills. This is thanks to the offer of follow-up courses for advanced editors and for those who want to delve deeper into Wikimedia Commons.

In 2023, we opened 14 Senior Write Wikipedia courses with 83 participants. Of these courses, ten were for beginners and four were for advanced learners. These courses were taught by a total of 10 tutors from the Wikimedia CZ tutoring team, to whom we are very grateful for their excellent work.

Participants of the course Seniors Write Wikipedia in České Budějovice with lecturer Jana Sekyrova, Author: Odrostlejsi, CC-BY-SA 4.0, via: Wikimedia Commons.

Would you like a Wikipedia Night School?

A senior Wikipedian can take a beginner’s course and two different follow-up courses. But what next? For the most active of the most active, we’ve launched a Wikipedia Night School course! This offers the opportunity to focus on developing specific skills that active Wikipedians need to do their job. This allows participants to focus on techniques and tweaks that can be used in editing practice and that are not covered in detail in Senior Write Wikipedia courses. It is important to note, however, that the online Wikipedia Night School is popular not only among seniors.

Last year, Wikipedians had the chance to visit nine such Wikipedia Night Schools. The range of topics was varied and everyone could choose. Some of the Night Schools topics included infoboxes, uploading images to Wikimedia Commons, Wikicests, and basic information about Wikidata. However, editors and photographers were also in the mix.

Senior Wikipedians Day – Senior WikiTown!

The real reward for active senior Wikipedians is traditionally the September Senior WikiTown. With a group of seniors we visited Lysá nad Labem in 2023, where participants could enjoy sightseeing, excursions and Wikipedia work. Wikipedians took home valuable documentation of places, people and historical context from the town and its surroundings. They documented in detail the Lysá Castle, the historical archive, the Bon Repos Chateau or the power plant on the Elbe. 

On the basis of our detective work, we have published rare photographs of the interior of Benies’ villa, which document the unique movement of Czech architectural cubism and thus expanded the article about this monument. In total, 583 photographs that were taken during the Senior WikiTown in Lysá nad Labem were published under a free license. A dozen articles were created and another 32 were expanded or corrected.

One of the participants, Mrs. Eva Mžourková, was at the Senior WikiTown in Lysá lad Labem for the first time and she left excited. 

“Thank you for opening up a space of Wikipedia that I had no idea about. I observe that there is a community of Wikipedians where there are specific rules – sympathetic. I hope to make progress on Wikipedia, it’s been fun so far.”

However, it’s important to note that it’s not just senior events where you can meet our senior Wikipedians. At every editathon, every reunion, and in Wikimedia Czech Republic itself, WikiSeniors are one of the most active and appreciative groups of Wikipedia users and contributors.

Senior WikiTown participants touring the Bon Repos castle, by Marie Dirillo, CC-BY-SA 4.0, via: Wikimedia Commons.

WikiSeniors were heard and seen

The Seniors Write Wikipedia programme is not just about working and enriching Wikipedia. The senior program is all about the people, their stories and valuable experiences that active seniors can pass on, not only by working on Wikipedia. It was with this message that we launched Wikipedia Senior Month last year. The whole month of December was dedicated to the most active WikiSeniors.

As every year, we also showed up with booths and the opportunity to try editing for yourself at the Prague Fresh Senior Festival and the Elpida Centre Open Day, where visitors could learn more about our activities for seniors, but also talk to our alumni. Our active WikiSeniors were able to chat not only with festival visitors but also with the editor of Czech Radio!

Conclusion

In 2023, we dedicated the year to the most active WikiSeniors, for whom we introduced a brand new format of education – Wikipedia Night School. Thanks to this, seniors can improve their skills on Wikipedia every month and become an even more prominent editorial group. We dedicated the whole month of December to this group. With the large number of courses, the Senior WikiTown and all the other events, we can say with a calm heart and full of joy that in 2023 the seniors were definitely not bored with us!

The 10th annual Wiki Indaba conference, held from October 4–6, 2024, in Johannesburg, South Africa, has left a lasting impression and reinforced the commitment to building sustainable Wikimedia communities across Africa and the diaspora. This year’s theme, “! ke e: /xarra //ke” — “United in diversity” — perfectly captured the spirit of collaboration and unity that defined this milestone event . Code for Africa (CfA) and the African Wikipedian Alliance (AWA) joined this year’s event with renewed purpose, with its staff members- Amanda Strydom and Bukola James bringing their expertise to deliver a keynote, conduct a workshop, and engage in vital discussions on misinformation.

Photo of Bukola James (Community Coordinator AWA), Emmanuelle G. Kakou (Coordonnatrice Générale, Wikimedia Community User Group of Côte d’Ivoire) and Amanda Strydom, CfA’s Senior Program Manager for CivicSignal

For AWA’s Francophone member, Emmanuelle Kakou,  who is also the Coordonnatrice Générale, Wikimedia Community User Group of Côte d’Ivoire this marked a unique milestone as the first volunteer from the AWA Francophone community to receive an all-expenses-paid sponsorship from CfA to attend the conference as a result of her commitment to promoting AWA’s mission and her unwavering support towards the community growth. Emmanuelle also had an opportunity to take back all she had learnt from participating in different conversations back to the AWA Francophone community, sharing first-hand experience with the rest of the community. 

In many ways, Code for Africa has grown to become the backbone of digital empowerment in Africa. As the continent’s largest civic tech and data journalism network, CfA supports groundbreaking initiatives, from AfricanDRONE and PesaCheck to AFCA and sensors.AFRICA. But beyond these programs, CfA is also the proud sponsor of AWA—a community initiative that has brought together over 308 African Wikipedians across both Anglophone and Francophone communities. Together, CfA and AWA represent a movement where people from all walks of life work to verify facts, close knowledge gaps, and stand as guardians against content manipulation on Wikipedia and its sister platforms.

Amanda Strydom, CfA’s Senior Program Manager for CivicSignal, deliivering a keynote address at WikiIndaba 2024

On the second day of the conference, Amanda Strydom, CfA’s Senior Program Manager for CivicSignal, opened with a keynote that left a lasting impact on everyone in the room. Amanda spoke passionately about “The Value of Fact-Checking for Open Knowledge,” sharing insights on how CfA uses its resources to support community engagement and fight misinformation. The talk wasn’t just about CfA’s many projects but about how our communities, united by shared goals, are actively protecting and promoting African narratives. For Emmanuelle and other attendees, Amanda’s address reinforced the importance of this work and the sense of shared responsibility across African Wikimedia communities.

The commitment to accuracy came alive later that day when I led the AWA fact-checking workshop with Amanda. The session, titled How to Tackle Mis/Disinformation on African Content on Wikipedia,” drew a diverse group of participants, eager to understand the inner workings of fact-checking on one of the world’s most visited websites. The workshop blended theory with practical exercises, guiding participants through scenarios on identifying and correcting false information on Wikipedia. Together, we tackled real issues, like debunking misinformation on an African Wikipedia article such as  Chindima Adetshina, while covering essential topics like:

  • Recognising information disorder and its effects
  • Tools and methods for fact-checking on Wikipedia
  • Practical application through scenario-based exercises

As we walked participants through the hands-on activities, I watched the group’s curiosity deepen. The questions grew more nuanced, the discussions more engaging with everyone contributing to the questions generated from the workshop exercise fully, showing a determination that reflected their commitment to bringing these skills back to their various communities. By the end, the session felt less like a workshop and more like a mission shared by everyone in the room. The participants left with the tools they needed and a new sense of how to contribute to Wikipedia in a way that would ensure the accuracy of African content for future generations.

Throughout the conference, feedback was overwhelmingly positive. Participants expressed appreciation for the workshop, with many describing it as a unique experience in their Wikipedia journeys. Some community leaders voiced their desire to host similar workshops in their own regions, opening the door for even more collaboration with AWA and CfA. The atmosphere was alive with the excitement of potential new partnerships, and I couldn’t help but feel that this event was just the beginning.

In the days following Wiki Indaba, as we all returned to our respective countries, the lessons we learned and the relationships we built continued to resonate. For AWA, the experience offered not just the chance to help African Wikimedians acquire fact-checking skills but to be part of something larger—a community where each contribution, each fact-checked article, adds to the rich, diverse story of Africa.

From left to right: Felix Nartey,( Senior Regional Partnership Manager Sub-Saharan Africa) with CfA staff and AWA WikiIndaba 2024 scholarship recipient

As AWA and CfA look ahead, we’re reminded that this journey is about more than just workshops or conferences. It’s about building a foundation that amplifies African voices and has the resources and support they need to make factual contributions to global knowledge. By fostering these connections and empowering these voices, we are building a future where Africa’s stories are told by those who live them, ensuring they are accurate, verified, and celebrated across the world. Join our growing community of fact-checking practitioners on Wikipedia and its sister projects by completing our membership form and connect via our Anglophone and Francophone channels for instant update. Also check out all of our past and present programs via meta. 

Every year Wikimedia Ukraine brings the Ukrainian community for its annual Wikiconference. This year’s edition took place in October and early November, bringing a record 120 people across two cities and online.

Because of the ongoing Russia’s war against Ukraine, we decided to hold three smaller conferences in one to include as many participants as possible. Here’s a brief look at our biggest event of the year.

Сollage of the conference’s group photos (image by multiple authors in public domain)

The format: three conferences in one

Since 2021, Wikimedia Ukraine has a tradition of dividing our annual Wikiconference into several separate events, both online and offline. First it was caused by pandemic restrictions, now by the ongoing Russian invasion of Ukraine. 

Because of Russia’s war against Ukraine millions of Ukrainians are abroad or unable to travel within Ukraine. Some are serving in the Armed Forces and can join online but cannot take a leave to participate in-person. However, we still want the conference to be inclusive of everyone.

Doing a truly hybrid event where online participants wouldn’t get an inferior experience is too difficult and expensive (note: live-streaming an offline event for people to tune in on YouTube is great, and we are also doing it, but it’s still an offline event). The solution for us is organizing a separate online-only event on a separate weekend, with its own program.

In 2024, our Wikiconference consisted of three events:

  • Online conference on October 19th-20th
  • The main offline conference in Kyiv on October 26th-27th
  • A smaller conference in Kharkiv on November 3rd

Both the online event and the Kyiv conference attracted over 60 people each, and the Kharkiv meetup gathered 25 people. Some people attended two or three events, but we had at least 124 confirmed unique participants overall (likely an undercount as we cannot capture all online participants).

Online conference: 60+ people, two days

The online edition took place on the weekend of October 19th and 20th. The conference featured 21 sessions, including our small “Wikimania at home”, as one organizing team member joked – 6 sessions from invited international speakers. 

Participants rated the program highly, and the feedback form shows that the international program was particularly popular – especially two sessions about AI and Wikimedia from the Wikimedia Foundation’s Asaf Bartov and Wikimedia Polska’s Natalia Ćwik. Other notable sessions included a slate of sessions on the Ukrainian community during the war, interactive welcome sessions, and a review of Wikimedia Ukraine’s activities. 

  • See full program in English
  • How to make an online conference engaging and tackle Zoom fatigue? It’s a big question for us, one that we’re hoping to tackle in a separate Diff post.
Image by Anton Protsiuk & Iryna Boiko, public domain

Kyiv conference: 60+ people from across Ukraine

A week later we gathered 63 people in Kyiv, Ukraine’s capital city. It was the biggest offline event of the year with people coming from all over Ukraine and even abroad. 

The two-day event included sessions on traditional topics like the Wikipedia Education Program, news and trends of the international wiki movement, practical workshops and project updates. We’ve also had one major innovation – an award ceremony for “Wikipedia 20”, a newly created award to distinguish people who’ve helped build Ukrainian Wikipedia over the 20 years of its existence. 

Although this event was offline-first, we also had a high-quality online stream on YouTube, which helped more people tune in virtually. 

Kharkiv conference: a meetup in a bomb shelter for 25 people

Despite being located close to the Russian border and suffering from constant Russian attacks, Ukraine’s second-largest city of Kharkiv has a vibrant Wikimedia community. On the first Sunday of November it gathered in a bomb shelter for a day-long conference for Wikimedians from the city and the surrounding region. 

Participants had a packed program, featuring everything from the experience of implementing a wikischool for high-school students held in Bohodukhiv last year to ideas for engaging young people in Wikipedia.

In the words of the Kharkiv event’s lead organizer, Wikiconference 2024 in Kharkiv showed the potential of community development in a frontline city and region, even under constant security threats.

Image by Serhii Bobok, CC BY-SA 4.0


. Keywords: TAROCH Coalition

Wikimedia Australia is proud to announce its membership in the TAROCH Coalition, a global alliance dedicated to preserving, sharing and advocating for cultural heritage. By joining, we reaffirm our commitment to empowering communities to access, co-design, and contribute to celebrating and protecting cultural heritage. Wikimedia projects, including Wikimedia Commons, play a vital role in hosting and making public domain cultural heritage content accessible to all.  

The TAROCH Coalition, which stands for "Towards A Recommendation On Cultural Heritage," unites organisations and individuals passionate about humanity's diverse heritage. Its goal is to achieve the adoption of a UNESCO Recommendation on Open Cultural Heritage by 2029. This legal instrument will promote open solutions to remove barriers to accessing cultural heritage in the public domain, while respecting governance frameworks from local regions.

Joining the TAROCH Coalition aligns with our mission to empower communities across Australia and the wider ESEAP region to share knowledge and build connections across cultures. Wikimedia Australia can play a key role in international dialogue and be part of a national agenda advocating for the removal of barriers and the adoption of open access policies in and for the cultural heritage sector.

Through this partnership, Wikimedia Australia will:  

  • Support Coalition Goals: Advocate for a UNESCO Recommendation that recognises the essential role of cultural heritage in identity, education, and global understanding while addressing local and regional needs.
  • Champion Open Knowledge: Promote free and accessible information for all, ensuring cultural heritage is responsibly and ethically documented and shared.  
  • Collaborate with Stakeholders: Partner with cultural institutions, community leaders, and like-minded organisations to amplify and protect underrepresented voices in heritage conversations.  

We are excited to join other Wikimedia affiliates – including Wikimedia Indonesia, Wikimedia UK, and Wikimedia Deutschland – alongside significant organisations such as Creative Commons, Flickr, Communia, and the International Federation of Library Associations and Institutions (IFLA).

We look forward to contributing to the TAROCH Coalition's impactful work and invite our community and partners to support this vital initiative. Together, we can ensure cultural heritage remains accessible and celebrated for generations to come.  

Useful links:[edit | edit source]

Episode 170: Stephen Harrison

Tuesday, 19 November 2024 22:36 UTC

🕑 1 hour

Stephen Harrison is a tech lawyer and journalist who has been writing about Wikipedia since 2018, including dozens of articles for the online magazine Slate as part of his Source Notes column. He is the author of the 2024 novel The Editors, which is about a group of editors of the fictional user-editable online encyclopedia "Infopendium" who are drawn together by dramatic events.

Links for some of the topics discussed:

The report “Open Movement’s Common(s) Causes” maps the current threats and opportunities facing the open movement, based on the ongoing work of the organisations behind the Common(s) Cause event, which took place in Katowice, Poland; as a pre-conference event for Wikimania 2024 on 6 August, 2024.

The meeting was organised by Creative Commons, Open Knowledge Foundation, Open Future, and Wikimedia Europe in collaboration with the Wikimedia Foundation. The goal of the meeting was to create links between different advocacy efforts so that a shared advocacy strategy for the Knowledge Commons can be created.

One of the calls that jumped out for us was a call for defining new open principles – principles that could clarify what openness means in the context of today’s digital space and ensure its pro-public, democratic potential. Formulating such principles could help against several challenges, e.g. open washing.

Another clear call is the one confirming the assumptions behind the Common(s) Cause project: it is the call for a shared advocacy agenda, which could help ensure that Knowledge Commons are treated and sustained as critical digital infrastructures.

The event gathered over 55 participants from 20 countries, most of whom travelled to Katowice to attend the Wikimania conference. The majority of attendees were from open advocacy communities. The event not only enabled the organizers to build stronger working ties with one another, but with the many other organisations who were represented at the event. 

Participants acknowledged that the power of the open movement is only as strong as the bonds of the people working to advance an open, equitable agenda, and collective impact can only be achieved through individuals from different organisations working closely together.

The report identifies a few common causes that can be found at the intersection of open movement organisations’ strategies, the socio-technological zeitgeist, and current policy opportunities, such as: 

  1. (Re)defining openness in a new technological era.
  2. Creation of a shared advocacy strategy and enhanced regional and thematic cooperation across the organisations.
  3. Developing and testing governance approaches for our digital commons.
  4. Advancing openness and sustainability for the technology, data, content, and governance of Digital Public Infrastructure.

This report is a starting point and serves as an invitation to the wider open community to join these causes as well as to formulate their own, which could then be backed by other organisations. The next step in this process will be disseminating its findings, hopefully resulting in further backing and refinement of the causes and additional feedback from the wider community, which this small convening could not fully represent.

Read the full report

O valor da Wikipédia na era da IA generativa

Tuesday, 19 November 2024 10:47 UTC

Pode parecer uma pergunta filosófica, mas atualmente essa é uma pergunta bastante prática, considerando os recentes avanços na inteligência artificial generativa e nos modelos de linguagem de grande escala (do inglês large language models, ou LLMs). Devido ao uso generalizado da tecnologia de IA generativa, projetada para prever e imitar respostas humanas, agora é possível criar, quase sem esforço, textos que parecem ter saído da Wikipédia.

Minha resposta a essa pergunta é simples: não, não seria a mesma coisa.

O processo de criar conhecimento de forma livre, compartilhá-lo e aperfeiçoá-lo ao longo do tempo, publicamente e com a ajuda de centenas de milhares de pessoas voluntárias, é o que, há 20 anos, tem definido a Wikipédia e os diversos outros projetos da Wikimedia. A Wikipédia contém conhecimento confiável e de fontes seguras justamente porque esses conteúdos são criados, debatidos e selecionados por pessoas. Ela também se baseia em um modelo aberto e não comercial, o que significa que a Wikipédia é livre para acessar e compartilhar, e sempre será. E em uma internet inundada de conteúdos gerados por máquinas, isso significa que a Wikipédia tem ainda mais valor.

Nos últimos seis meses, dezenas de LLMs foram lançados ao público, treinados com base em amplos conjuntos de dados capazes de ler, resumir e gerar textos. A Wikipédia é uma das maiores bases abertas de informação da internet, com versões em mais de 300 idiomas. Até o momento, todos os LLMs são treinados com base nos conteúdos da Wikipédia, e ela é quase sempre a maior fonte de dados de treinamento nos conjuntos de dados desses LLMs.

Uma coisa óbvia a se fazer com alguma  desses novos sistemas é tentar gerar artigos da Wikipédia. É claro que as pessoas já tentaram. E, tenho certeza de que muitos leitores já perceberam isso em primeira mão, essas tentativas mostram muitos desafios no uso de LLMs para produzir o que wikipedistas chamam de conhecimento, ou seja, textos e imagens confiáveis, em formato enciclopédico, com fontes seguras. Algumas dessas limitações incluem as seguintes:

  • Atualmente, os resultados dos LLMs não passam por uma checagem de fatos, e já há muitos casos conhecidos de pessoas que usam a IA generativa para tentar realizar seus trabalhos. Há inúmeras situações de baixo risco em que os resultados podem ser úteis sem causar nenhum risco, como prompts para criar textos de agradecimento, planos para férias divertidas ou um roteiro para dar início a uma redação. No entanto, em outras situações, os resultados não são tão bons, como no caso em que um LLM fabricou processos judiciais, e o advogado que usou esses resultados em um tribunal acabou sendo multado. Em outra situação, um médico demonstrou que um sistema de IA generativa apresentava diagnósticos inadequados ao analisar sintomas de pacientes atendidos no pronto-socorro. Com o tempo, acredito que esses sistemas ficarão muito melhores e se tornarão mais confiáveis em uma variedade de contextos. Uma possibilidade interessante é que a demanda por melhores fontes melhorará o acesso a pesquisas e livros on-line. Mas será preciso tempo para chegar lá e, provavelmente, uma pressão significativa por parte dos órgãos reguladores e do público para que haja melhorias que beneficiem todas as pessoas.
  • Os LLMs não podem contar com informações que não foram usadas em seu treinamento para responder aos prompts. Isso significa que todos os livros do mundo que não estão disponíveis na íntegra on-line, conteúdos de pesquisas anteriores ao advento da internet e informações em outros idiomas que não o inglês não fazem parte daquilo que um LLM típico “sabe”. Consequentemente, os conjuntos de dados usados para treinar LLMs atualmente podem ampliar as desigualdades e os vieses existentes em muitas áreas – como nas contratações, na medicina e em sentenças criminais. Talvez um dia isso mude, mas estamos muito longe de poder acessar livremente e treinar LLMs em todos os diferentes tipos de informações que as pessoas em todos os idiomas usam atualmente para criar conteúdo para a Wikipédia. E, mesmo então, será necessário mais trabalho para mitigar os vieses.
  • Por fim, já foi demonstrado que LLMs treinados a partir dos resultados de LLMs têm um desempenho comprovadamente pior, e chegam até mesmo a esquecer de coisas que eles já “sabiam”, uma condição chamada de “colapso do modelo”. Isso significa que, para que os LLMs tenham bons resultados e continuem melhorando, eles precisarão de um abastecimento constante de conteúdos originais, escritos por humanos, o que torna a Wikipédia e outras fontes de conteúdos gerados por humanos ainda mais valiosas. Também significa que as empresas de IA generativa de todo o mundo precisam descobrir como manter as fontes de conteúdos humanos originais, o elemento mais importante do nosso ecossistema de informações, sustentável e crescendo com o tempo.

Esses são apenas alguns dos problemas que precisam ser resolvidos enquanto internautas exploram como os LLMs podem ser usados. Acreditamos que internautas darão cada vez mais valor a fontes confiáveis de informações que tenham sido validadas por pessoas. As políticas da Wikipédia e nossa experiência de mais de uma década no uso do aprendizado de máquina para apoiar voluntários humanos oferecem lições valiosas sobre esse futuro.

Princípios para uso da IA generativa

O conteúdo gerado por máquina e as ferramentas de aprendizado de máquina não são novidade na Wikipédia e nos demais projetos da Wikimedia. Na Wikimedia Foundation, desenvolvemos ferramentas de aprendizado de máquina e IA com base nos mesmos princípios que tornaram a Wikipédia um recurso tão útil para tantas pessoas: dando centralidade à moderação de conteúdo e à governança humana. Continuamos a experimentar novas maneiras de atender às necessidades das pessoas por conhecimento de forma responsável, inclusive com plataformas de IA generativa, com o objetivo de colocar a contribuição humana e a reciprocidade em primeiro plano. As pessoas editoras da Wikipédia têm controle sobre todo o conteúdo gerado por máquina – elas editam, aprimoram e auditam qualquer trabalho feito por IA – e criam políticas e estruturas para controlar as ferramentas de aprendizado de máquina usadas para gerar conteúdo para a Wikipédia.

Esses princípios podem ser um bom ponto de partida para o uso dos LLMs atuais e em desenvolvimento. Para começar, os LLMs devem considerar como seus modelos auxiliam as pessoas de três maneiras principais:

  1. Sustentabilidade. A tecnologia de IA generativa tem o potencial de afetar negativamente a motivação humana para criar conteúdo. Para preservar e incentivar mais pessoas a contribuir com seu conhecimento para o bem comum, os LLMs devem procurar aumentar e apoiar a participação humana no cultivo e na criação de conhecimento. Eles não devem jamais impedir ou substituir a criação humana de conhecimento. Isso pode ser alcançado mantendo sempre os humanos no processo e dando o devido crédito às suas contribuições. Continuar a apoiar os seres humanos no compartilhamento de seus conhecimentos não só é algo que está alinhado à missão estratégica do movimento Wikimedia, como também será necessário para continuar a expandir nosso ecossistema geral de informações, que é o que cria os dados de treinamento atualizados dos quais os LLMs dependem.
  2. Equidade. Na melhor das hipóteses, os LLMs podem ampliar o acesso às informações e oferecer formas inovadoras de fornecer informações a quem busca conhecimento. Para isso, essas plataformas precisam incorporar verificações e contrapesos que não reproduzam os vieses de informação, não ampliem as lacunas de conhecimento, não perpetuem o apagamento de histórias e perspectivas tradicionalmente excluídas nem contribuam com danos aos direitos humanos. Os LLMs também devem considerar como identificar, tratar e corrigir vieses nos dados de treinamento que podem produzir resultados imprecisos e extremamente injustos.
  3. Transparência. Os LLMs e suas interfaces devem permitir que os humanos entendam a origem dos resultados do modelo, verifiquem e corrijam esses resultados. Uma maior transparência na forma como os resultados são gerados pode nos ajudar a entender e, então, mitigar vieses sistêmicos nocivos. Ao permitir que os usuários desses sistemas avaliem as causas e as consequências dos vieses que podem estar presentes nos dados de treinamento ou nos resultados, pessoas criadoras e usuárias poderão contribuir para uma maior compreensão e a aplicação criteriosa dessas ferramentas.

Visão para um futuro confiável

A contribuição humana é parte essencial da internet. As pessoas são o motor que impulsionou o crescimento e a expansão da web, criando um espaço incrível para o aprendizado, os negócios e a conexão com outras pessoas.

A IA generativa pode substituir a Wikipédia? Ela pode tentar, mas essa é uma substituição que ninguém realmente deseja. Não há nada de inevitável nas novas tecnologias. Em vez disso, cabe a todos nós escolher o que é mais importante. Podemos priorizar a compreensão humana e sua contribuição com o conhecimento no mundo – de forma sustentável, equitativa e transparente – como um dos principais objetivos dos sistemas de IA generativa, e não como algo secundário. Isso ajudaria a mitigar o aumento da desinformação e das alucinações dos LLMs; garantiria que a criatividade humana fosse reconhecida pelo conhecimento criado; e, o mais importante, assegurará que os LLMs e as pessoas possam continuar a contar com um ecossistema de informações atualizado, em evolução e confiável a longo prazo.

Selena Deckelmann é Diretora de Produtos e Tecnologia na Wikimedia Foundation.

The post O valor da Wikipédia na era da IA generativa appeared first on Wikimedia Foundation.

David-James Gonzales is an Assistant Professor of History at Brigham Young University and the host of New Books in Latino Studies. He is a historian of migration, urbanization, and social movements in the U.S., and specializes in Latina/o/x politics and social movements. 

I began teaching with the Wikipedia assignment in the spring of 2018. At the time, I sought an alternative to the standard term paper that had been, and likely remains, the staple of most college history courses. My motivation was to find an assignment that students would enjoy completing and that I would enjoy grading. Over my previous six years of university teaching, I developed a dread for grading term papers as it became apparent that most students either did not have the time or did not see the point in writing a well-researched argumentative paper. Moreover, I noticed that many of my students were developing bad habits in their rush to complete term papers, including committing to an argument before establishing a research question, cherry-picking sources that confirmed unfounded assumptions, and ignoring counterevidence. I desired an assignment that would reinforce the teaching of historical methodology and leverage the accessibility of the internet, allowing students to reach a broader audience, which I hoped would motivate them to take greater pride in their work.

David-James Gonzales
David-James Gonzales. Image courtesy David-James Gonzales, all rights reserved.

After speaking with colleagues and searching the internet for ideas, I stumbled upon the Wiki Education website and found the Wikipedia assignment. Despite my lack of experience editing or authoring Wikipedia pages, I was drawn to the assignment because it facilitates experiential learning by requiring students to apply the knowledge acquired through course readings, lectures, and research to a public-facing project. In my US history survey course, for example, I use the Wikipedia assignment instead of a final paper to evaluate students’ ability to do the work of a historian by choosing a topic, developing a research question, selecting and evaluating sources, and writing a historical narrative. 

I also use the assignment to help students build social and professional skills applicable beyond the classroom. To promote peer collaboration in larger classes, I have students work in pairs. Admittedly, most groan when they hear this is a group project; however, by the end of the semester, they overwhelmingly express appreciation for their partner and the flexibility the assignment provides to capitalize on each person’s strengths. For example, those interested in computer programming and coding tend to enjoy learning about wikitext and the formatting aspects of the assignment. For others, conducting research, locating images, videos, and sound clips, or writing the text of the article is preferred. While I require them to work in pairs, students decide how to manage their workload by deciding who does what and evaluating each other’s performance at the end of the term.   

To facilitate student-teacher mentoring, I require students to meet with me throughout the semester to approve their topics and receive feedback on sources and drafts. These interactions help break down the reluctance and intimidation students feel towards interacting with authority figures and often lead to future opportunities to advise them about their degree progress, university resources, and career opportunities. To teach information and media literacy, I have students turn in an annotated bibliography halfway through the term. Although not a required part of the Wikipedia assignment, I find that it reinforces the dashboard’s trainings on evaluating sources according to the credibility of the author and publication. It also teaches students to pay as much, if not more, attention to the sources used in a publication than the text itself. 

I have used the Wikipedia assignment in thirteen courses over the past six years and have been thrilled by the results. Overall, my students have published 180 new articles, edited an additional 492 articles, and added 8,500 references to Wikipedia! Incredibly, their work has received over 13 million views as of spring 2024. But the best part is that my students admit they enjoy the assignment. 

Here are a few examples of what students appreciate about the Wikipedia assignment: 

“The Wikipedia project we had over the course of the semester was very effective in getting us all to participate in the learning process. It helped us to be more involved in research and in learning how to be historians.”

“I loved the Wikipedia project we worked on throughout the semester. We got to pick our own topic and I appreciated what it taught me about doing accurate historical research.”

“I loved the Wikipedia Assignment in this class and using our research skills to be able to put something useful out onto the internet.”

“The incorporation of making a Wikipedia article was the best way to actually be part of making and recording history.”

As reflected in the comments above, students relish the “hands-on” opportunity provided by the Wikipedia assignment to apply what they learn through a medium that allows them to create something that makes a public contribution beyond the classroom. And this is the primary reason why I continue to teach with Wikipedia; it encourages students to become more informed knowledge producers rather than passive consumers of information.


Interested in incorporating a Wikipedia assignment into your courses? Visit teach.wikiedu.org to learn more about the free resources, digital tools, and staff support that Wiki Education offers to postsecondary instructors in the United States and Canada. 

Wikipedia:Wikipedia Signpost/2024-11-18/Traffic report

Monday, 18 November 2024 00:00 UTC
File:2024 US elections Donald Trump selection.jpg
Oleg Yunakov
cc-by-sa-4.0
139
578
Traffic report

Well, let us share with you our knowledge, about the electoral college

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, Vestrian24Bio, and CAWylie (October 27 to November 2); and Igordebraga, Soulbust, Vestrian24Bio, and Rajan51 (November 3 to 9).

Oh, sweet mystery of life at last I've found you! (October 27 to November 2)

Rank Article Class Views Image Notes/about
1 Teri Garr 1,355,055 This American actress known for her comedic roles in film and television, such as Young Frankenstein, Tootsie, and playing the mother of Phoebe Buffay on Friends, died at the age of 79 last Tuesday after years fighting multiple sclerosis.
2 2024 Ballon d'Or 1,273,764 European champion Rodri was chosen by France Football as the best player of the season. Debates soon started discussing if Vinícius Júnior, who was also European champion, would've been a more deserving winner.
3 Rodney Alcala 1,258,084 Netflix brought attention to this reprehensible man who killed and assaulted at least 8 women (some of them minors), was sentenced to death, and died of natural causes after decades in prison. Although the distinction that made Alcala's story be told in a movie, Woman of the Hour, is the fact that in the middle of his killing spree he appeared in a matchmaking TV show and won a date, though the woman declined to go out with him and thus escaped a grisly fate.
4 2024 United States presidential election 1,234,532 At least it's over? I'll be catching up on sleep now. Next week's Report will have a lot to discuss on this.
5 Tony Hinchcliffe 1,121,021 The 2024 Trump rally at Madison Square Garden (which was compared by the opposition's potential VP to 1939 Nazi rally at Madison Square Garden, proving Godwin's law is alive and well) had a set by this comedian, to which the reaction wasn't pretty; Hinchcliffe's description of Puerto Rico as a "floating island of garbage" in particular drew much criticism.
6 Rúben Amorim 1,110,284 Manchester United hired this Portuguese coach, who has just managed Sporting CP to a national title.
7 Liam Payne 1,069,395 Two weeks after the shocking death of this musician falling off a hotel balcony at just 33, readers want to learn if the Argentinian police have discovered more on what happened that night.
8 Diwali 1,053,976 The Hindu festival of lights, symbolising the spiritual victory of Dharma over Adharma, light over darkness, good over evil, and knowledge over ignorance, annually celebrated on Kartik Amavasya as per the Hindu lunisolar calendar, which usually falls from the second half of October to the first half of November.
9 Deaths in 2024 1,005,464 "From that fateful day when stinking bits of slime first crawled from the sea and shouted to the cold stars, 'I am man!', our greatest dread has always been the knowledge of our mortality."
10 Freddie Freeman 988,883 As the Los Angeles Dodgers won their eighth MLB title, the World Series Most Valuable Player Award was this first baseman who had home runs in the first four games, including a walk-off grand slam in the first. And adding the 2021 finals that Freeman won with the Atlanta Braves, he had home runs on six consecutive World Series games.

For this could be the biggest sky, and I could have the faintest idea (November 3 to 9)

Rank Article Class Views Image Notes/about
1 2024 United States presidential election 9,045,895 U.S. election between Democrat Harris (#4) and Republican Trump (#3), who won both the Electoral College and the popular vote.
2 2020 United States presidential election 6,934,170 Previous U.S. election, between then-incumbent Trump (#3) and successful Democratic challenger Joe Biden.
3 Donald Trump 5,268,623 Republican elected as the 47th U.S. President, after emerging victorious in #1 against #5. He became the second President to win non-consecutive elections, after Grover Cleveland (1884 and 1892).
4 2016 United States presidential election 3,477,149 The erelast election, in which Trump (#3) defeated Democratic candidate Hillary Clinton.
5 Kamala Harris 3.378,730 Lost the 2024 U.S. presidential election (#1). Lots can be said about the defeat.
6 Susie Wiles 2,428,992 After leading #3 to two successful elections, this political consultant will become the first female White House Chief of Staff.
7 JD Vance 2,243,627 Recently elected Vice President, e.g. #2 to this week's #3.
8 Quincy Jones 1,747,761 One of the greatest music producers of all time, whose work included the best-selling album ever and the Austin Powers theme, and who also had a hand in television by helping make shows like The Fresh Prince of Bel-Air and Mad TV, died on November 3 at the age of 91. Former Presidents Clinton and Obama, as well as President Biden and VP Harris all paid their tributes.
9 Project 2025 1,736,612 To sum the general reaction to this conservative plan for reforms, let's quote someone who didn't live to see #2:

I'm Afraid of Americans
I'm afraid of the world
I'm afraid I can't help it...

10 2024 United States elections 1,692,891 In addition to the presidential election (#1), the U.S. also saw elections in the Senate and House of Representatives, as well as gubernatorial and legislative elections.

Exclusions

  • These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.

Most edited articles

For the October 11 – November 11 period, per this database report.

Title Revisions Notes
Deaths in 2024 2084 Among the obituary's inclusions in the period, along with the three listed above, were Baba Siddique, Mitzi Gaynor, Paul Di'Anno and Tony Todd.
2024 United States presidential election 1675 We are citizens of this land
And we're here to lend a hand
We come together and we vote
Because we're all in the same boat...
Timeline of the Israel–Hamas war (27 September 2024 – present) 1600 The pain experienced in the Gaza Strip doesn't seem to end, and has extended to the West Bank and Lebanon.
2024 Maharashtra Legislative Assembly election 1332 A few months after choosing their federal representatives, India voted on their state assemblies. Maharashtra, the country's second most populous province (which houses their biggest city Mumbai), mostly went for the Bharatiya Janata Party that already rules the country.
Chromakopia 1242 One week after single "Noid", Tyler, the Creator released his eighth album to critical acclaim and quickly becoming the most successful rap album of the year (its first day on Spotify alone is one of the 20 biggest).
Tropical Storm Trami (2024) 1170 The Philippines were ravaged by this cyclone (that caused lesser damage once it reached Vietnam and Thailand), with 178 deaths, 23 people reported missing, 151 others injured, and US$374 million in damages.
2024 World Series 1108 Major League Baseball came down to the biggest cities of the United States, and the New York Yankees win on game 4 only delayed the title by the Los Angeles Dodgers. As mentioned above, the MVP was Freddie Freeman, and the Japanese designated hitter nicknamed "Shotime" justified the Dodgers paying him a record contract of $700 million over 10 years by helping them to a World Series right in his first season with the team.
2024 Pacific typhoon season 928 Tropical cyclones form between June and November, so lots of storms to cover. The strongest were Milton and Helene in the Atlantic, and Yagi and Krathon in the Pacific.
2024 Atlantic hurricane season 905
Israel–Hamas war 887 Ever since Israel went on war with Hamas, their other enemies Hezbollah took the opportunity for attacks of their own. Israel eventually decided to extend its war on Palestine to Lebanon, with exploding pagers, an air strike on the Hezbollah headquarters and ultimately a ground invasion. The international community just can't wait for the ceasefires.
Timeline of the Israel–Hezbollah conflict (17 September 2024 – present) 883
Liam Payne 811 The One Direction member went to Buenos Aires to solve O visa problems that would prevent him from going to his girlfriend's home in Miami, and while there watch a concert by former bandmate Niall Horan. Two weeks later he fell to death from his hotel room. Lots of edits were made with updates on the investigation, and apparently he fainted on the balcony after a night of drugs.
Donald Trump 773 And can you hear the sound of hysteria?
The subliminal mind Trump America...
2024 Jharkhand Legislative Assembly election 770 Another of India's State Assembly elections, namely for Jharkhand. The BJP were tied for the most seats with the Jharkhand Mukti Morcha.
Bigg Boss (Hindi TV series) season 18 769 One of the Indian versions of Big Brother.

Wikipedia:Wikipedia Signpost/2024-11-18/Recent research

Monday, 18 November 2024 00:00 UTC
File:SPINACH (SPARQL-Based Information Navigation for Challenging Real-World Questions) logo.png
Liu, Shicheng; Semnani, Sina; Triedman, Harold; Xu, Jialiang; Zhao, Isaac Dan; Lam, Monica
CC BY 4.0
75
0
450
Recent research

SPINACH: AI help for asking Wikidata "challenging real-world questions"


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"SPINACH": LLM-based tool to translate "challenging real-world questions" into Wikidata SPARQL queries

SPINACH's logo or custom emoji (from the paper's title, which we regret not being able to reproduce faithfully here)

A paper[1] presented at last week's EMNLP conference reports on a promising new AI-based tool (available at https://spinach.genie.stanford.edu/ ) to retrieve information from Wikidata using natural language questions. It can successfully answer complicated questions like the following:

"What are the musical instruments played by people who are affiliated with the University of Washington School of Music and have been educated at the University of Washington, and how many people play each instrument?"

The authors note that Wikidata is one of the largest publicly available knowledge bases [and] currently contains 15 billion facts, and claim that it is of significant value to many scientific communities. However, they observe that Effective access to Wikidata data can be challenging, requiring use of the SPARQL query language.

This motivates the use of large language models to convert natural language questions into SPARQL queries, which could obviously be of great value to non-technical users. The paper is far from being the first such attempt, see also below for a more narrowly tailored effort. And in fact, some of its authors (including Monica S. Lam and members of her group at Stanford) had already built such a system – "WikiSP" – themselves last year, obtained by fine-tuning an LLM; see our review: "Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata". (Readers of this column may also recall coverage of Wikipedia-related publications out of Lam's group, see "STORM: AI agents role-play as 'Wikipedia editors' and 'experts' to create Wikipedia-like articles" and "WikiChat, 'the first few-shot LLM-based chatbot that almost never hallucinates'" – a paper that received the Wikimedia Foundation's "Research Award of the Year".)

The SPINACH dataset

More generally, this kind of task is called "Knowledge Base Question Answering" (KBQA). The authors observe that many benchmarks have been published for it over the last decade, and that recently, the KBQA community has shifted toward using Wikidata as the underlying knowledge base for KBQA datasets. However, they criticize those existing benchmarks as either contain[ing] only simple questions [...] or synthetically generated complex logical forms that are not representative enough of real-world queries. To remedy this, they

introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from forum discussions on Wikidata's "Request a Query" forum with 320 decontextualized question-SPARQL pairs. Much more complex than existing datasets, SPINACH calls for strong KBQA systems that do not rely on training data to learn the KB schema, but can dynamically explore large and often incomplete schemas and reason about them.

In more detail, the researchers scraped the "Request a Query" forum's archive from 2016 up to May 2024, obtaining 2780 discussions that had resulted in a valid SPARQL query, which were then filtered by various criteria and sampled to a subset of 920 conversations spanning many domains for consideration. Those were then further winnowed down with a focus on end-users rather than Wikipedia and Wikidata contributors interested in obscure optimizations or formatting. The remaining conversations were manually annotated with a self-contained, decontextualized natural language question that accurately captures the meaning of the user-written SPARQL. These steps include disambiguation of terms in the question as originally asked in the forum (For example, instead of asking "where a movie takes place", we distinguish between the "narrative location” and the "filming location"; thus avoiding an example that had confused the authors' own WikiSP system). This might be regarded as attaching training wheels, i.e. artificially making the task a little bit easier. However, another step goes in the other direction, by refrain[ing] from directly using [Wikidata's] entity and property names, instead using a more natural way to express the meaning. For instance, instead of asking "what is the point of time of the goal?", a more natural question with the same level of accuracy like "when does the goal take place?" should be used.

The SPINACH agent

The paper's second contribution is an LLM-based system, also called "SPINACH", that on the authors' own dataset outperforms all baselines, including the best GPT-4-based KBQA agent by a large margin, and also achiev[es] a new state of the art on several existing KBQA benchmarks, although on it narrowly remains behind the aforementioned WikiSP model on the WikiWebQuestions dataset (both also out of Lam's lab).

"unlike prior work, we design SPINACH with the primary goal of mimicking a human expert writing a SPARQL query. An expert starts by writing simple queries and looking up Wikidata entity or property pages when needed, all to understand the structure of the knowledge graph and what connections exist. This is especially important for Wikidata due to its anomalous structure (Shenoy et al., 2022). An expert then might add new SPARQL clauses to build towards the final SPARQL, checking their work along the way by executing intermediate queries and eyeballing the results."

This agent is given several tools to use, namely

  • searching Wikidata for the QID for a string (like a human user would using the search box on the Wikidata site). This addresses an issue that thwarts many naive attempts to use e.g. ChatGPT directly for generating SPARQL queries, which the aforementioned WikiSP paper already pointed out last year: "While zero-shot LLMs [e.g. ChatGPT] can generate SPARQL queries for the easiest and most common questions, they do not know all the PIDs and QIDs [property and item IDs in Wikidata]."
  • retrieving the Wikidata entry for a QID (i.e. all the information on its Wikidata page)
  • retrieving a few examples demonstrating the use of the specified property in Wikidata
  • running a SPARQL query on the Wikidata Query Service

The authors note that Importantly, the results of the execution of each action are put in a human-readable format to make it easier for the LLM to process. To limit the amount of information that the agent has to process, we limit the output of search results to at most 8 entities and 4 properties, and limit large results of SPARQL queries to the first and last 5 rows. That LLMs and humans have similar problems reading through copious Wikidata query results is a somewhat intriguing observation, considering that Wikidata was conceived as a machine-readable knowledge repository. (In an apparent effort to address the low usage of Wikidata in today's AI systems, Wikimedia Deutschland recently announced "a project to simplify access to the open data in Wikidata for AI applications" by "transformation of Wikidata’s data into semantic vectors.")

The SPINACH system uses the popular ReAct (Reasoning and Acting) framework for LLM agents,[supp 1] where the model is alternating between reasoning about its task (e.g. It seems like there is an issue with the QID I used for the University of Washington. I should search for the correct QID) and acting (e.g. using its search tool: search_wikidata("University of Washington")).

The generation of these thought + action pairs in each turn is driven by an agent policy prompt

that only includes high-level instructions such as "start by constructing very simple queries and gradually build towards the complete query" and "confirm all your assumptions about the structure of Wikidata before proceeding" [...]. The decision of selecting the action at each time step is left to the LLM.

Successfully answering a question with a correct SPARQL query can require numerous turns. The researchers limit these by providing the agents with a budget of 15 actions to take, and an extra 15 actions to spend on [...] "rollbacks" of such actions. Even so, Since SPINACH agent makes multiple LLM calls for each question, its latency and cost are higher compared to simpler systems. [...] This seems to be the price for a more accurate KBQA system.

Still, for the time being, an instance is available for free at https://spinach.genie.stanford.edu/ , and also on-wiki as a bot (operated by one of the authors, a – now former – Wikimedia Foundation employee), which has already answered about 30 user queries since its introduction some months ago.

Example from the paper: "The sequence of 13 actions that the SPINACH agent takes to answer a sample question from the SPINACH validation set. Here, the agent goes through several distinct phases, only with the high-level instruction [prompt]. Note that every step includes a thought, action and observation, but some are omitted here for brevity."

Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph"

From the abstract:[2]

"we evaluate several strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs. In particular, we propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph towards a larger dataset of semantically enriched question-to-SPARQL query pairs, enabling fine-tuning even for datasets where these pairs are scarce."

From the paper:

"Recently, the benchmark dataset so-called [sic] KQA Pro was released [...]. It is a large-scale dataset for complex question answering over a dense subset of the Wikidata1 KB. [...] Although Wikidata is not a domain specific KB, it contains relevant life science data."
"We augment an existing catalog of representative questions over a given knowledge graph and fine-tune OpenLlama in two steps: We first fine-tune the base model using the KQA Pro dataset over Wikidata. Next, we further fine-tune the resulting model using the extended set of questions and queries over the target knowledge graph. Finally, we obtain a system for Question Answering over Knowledge Graphs (KGQA) which translates natural language user questions into their corresponding SPARQL queries over the target KG."

A small number of "culprits" cause over 10 million "Disjointness Violations in Wikidata"

This preprint identifies 51 pairs of classes on Wikidata that should be disjoint (e.g. "natural object" vs. "artificial object") but aren't, with over 10 million violations, caused by a small number of "culprits". From the abstract:[3]

"Disjointness checks are among the most important constraint checks in a knowledge base and can be used to help detect and correct incorrect statements and internal contradictions. [...] Because of both its size and construction, Wikidata contains many incorrect statements and internal contradictions. We analyze the current modeling of disjointness on Wikidata, identify patterns that cause these disjointness violations and categorize them. We use SPARQL queries to identify each 'culprit' causing a disjointness violation and lay out formulas to identify and fix conflicting information. We finally discuss how disjointness information could be better modeled and expanded in Wikidata in the future."


"Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature Review"

From the abstract:[4]

"We review existing methods for automatically measuring the quality of Wikipedia articles, identifying and comparing machine learning algorithms, article features, quality metrics, and used datasets, examining 149 distinct studies, and exploring commonalities and gaps in them. The literature is extensive, and the approaches follow past technological trends. However, machine learning is still not widely used by Wikipedia, and we hope that our analysis helps future researchers change that reality."

References

  1. ^ Liu, Shicheng; Semnani, Sina; Triedman, Harold; Xu, Jialiang; Zhao, Isaac Dan; Lam, Monica (November 2024). "SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions". In Yaser Al-Onaizan; Mohit Bansal; Yun-Nung Chen (eds.). Findings of the Association for Computational Linguistics: EMNLP 2024. Findings 2024. Miami, Florida, USA: Association for Computational Linguistics. pp. 15977–16001. Data and code Online tool
  2. ^ Rangel, Julio C.; de Farias, Tarcisio Mendes; Sima, Ana Claudia; Kobayashi, Norio (2024-02-07), SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph, arXiv, doi:10.48550/arXiv.2402.04627 (accepted submission at SWAT4HCLS 2024: The 15th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences)
  3. ^ Doğan, Ege Atacan; Patel-Schneider, Peter F. (2024-10-17), Disjointness Violations in Wikidata, arXiv, doi:10.48550/arXiv.2410.13707
  4. ^ Moás, Pedro Miguel; Lopes, Carla Teixeira (2023-09-22). "Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature Review". ACM Computing Surveys. doi:10.1145/3625286. ISSN 0360-0300.
Supplementary references and notes:
  1. ^ Yao, Shunyu; Zhao, Jeffrey; Yu, Dian; Du, Nan; Shafran, Izhak; Narasimhan, Karthik; Cao, Yuan (2023-03-09), ReAct: Synergizing Reasoning and Acting in Language Models, doi:10.48550/arXiv.2210.03629


File:Institute_Dendrology_-_3.jpg
Fira Guli
CC BY-SA 4.0
300
News from the WMF

Wikimedia Foundation and Wikimedia Endowment audit reports: FY 2023–2024

Elena Lappen is the Wikimedia Foundation's Movement Communications Manager; some content in this post was previously published on Diff.

Highlights from the fiscal year 2023–2024 Wikimedia Foundation and Wikimedia Endowment audit reports

Every year, the Wikimedia Foundation shares our audited financial statements along with an explanation of what the numbers mean. Our goal is to make our finances understandable, so that community members, donors, readers and more have clear insight into how we use our funds to further Wikimedia's mission.

This post explains the audit reports for both the Wikimedia Foundation and the Wikimedia Endowment for fiscal year 2023–2024, providing key highlights and additional information for those who want to dive deeper.

What is an audit report?

An audit report presents details on the financial balances and financial activities of any organization, as required by US accounting standards. It is audited by a third party (in the Foundation's and Endowment's case, KPMG) in order to validate accuracy. The Foundation has received clean audits for the past 19 years. Each annual audit is an opportunity to evaluate the Foundation's activities and credibility as a responsible steward of donor funds.

The financial information found in the audit report is also then used to build an organization's Form 990, which is the form required by the United States government for organizations to maintain their nonprofit status. The Form 990 is released closer to the end of the current fiscal year.

Key takeaways from the Foundation's fiscal year 2023-2024 audit report

The Foundation's 2023-2024 Annual Plan laid out a number of financial goals for the fiscal year. Below are key takeaways from the audit report related to those goals:

  • Clean audit opinion: The external auditors, KPMG, issued their opinion that the Wikimedia Foundation's financial statements for FY 2023–2024 are presented accurately, marking the 19th consecutive year of clean audits since the Foundation's first audit in 2006.
  • Expense growth slowing in line with target: In anticipation of slower revenue growth, our 2023–2024 Annual Plan aimed to slow budget growth to around 5% after significant growth in the prior five years averaging 16%. We were able to reach that goal: during the fiscal year, expenses grew at 5.5% ($9.4M), from $169.1M to $178.5M. This came in at only slightly over our target of $177M. Growth in expenses was driven primarily by increases in movement funding (detailed below) and increases in personnel cost due mostly to cost of living adjustments. The Foundation is working to continue this trend of stabilizing growth in the current fiscal year. As outlined in the annual plan for fiscal year 2024–2025, the budget is expected to be $188.7M, which is 6% percent year on year growth.
→ During the year, we prioritized spending on a number of Infrastructure related projects which is the largest area of the Foundation's work. Projects included a revamp of the Community Wishlist, new features for events and campaigns, improvements in moderation tools (e.g., EditCheck, Automoderator, Community Configuration etc.), and a new data center in Brazil.
→ Also during the year, we decided not to renew our lease of our San Francisco office and to instead move to a small administrative space. This move was aimed at both reducing expenses and responding to an increasingly global workforce, where the vast majority of employees (82%) are based outside the San Francisco Bay Area. This move will result in a rent cost savings of over 80% per month.
  • More budget shifted toward movement support: The Annual Plan aimed to increase the percentage of the budget that goes directly to supporting the mission. This means working to minimize both fundraising and administrative costs and increase support for things like platform maintenance, grants to communities, feature development and more. This year's percentage was 77.5%, up from 76% in the prior fiscal year. In real terms, this means that $9.8M more went to direct movement support in the 2023-2024 fiscal year than the prior fiscal year. While this percentage was just shy of our goal of 77.9%, it is well within the range of best practice for nonprofits, which recommends that at least 65% be devoted to programmatic work.
→ Progress was made on greater effectiveness in how we communicate with communities which collectively speak hundreds of languages. A new system for providing translations of core Foundation documentation enabled us to complete more than 650 requests for translations in a year. This has increased the number of languages supported from six to thirty-four languages in written translations. As an added benefit, the translations are provided by members of the Wikimedia volunteer community – whose experience and knowledge of the movement provides much higher quality translations.
  • Growth sustained in community grants: In spite of the Foundation's overall growth slowing to 5%, we increased community grants by $2.2M, or 9.9% from the previous fiscal year. Our Annual Plans have repeatedly prioritized growing community funding at a significantly higher rate than the overall budget–a goal we have continued to prioritize in the 2024-2025 Annual Plan.
→ We support our grantees by working closely with them to form strategic partnerships to close content gaps. An example is how we supported community gender gap campaigns in biographies and women's health during Women's History Month. This included running the Wikipedia Needs More Women campaign (14.5M Unique people reached) and coordinating the global landing page and calendar for the Celebrate Women campaign.
  • Exploring diversified revenue streams for the movement: In order to ensure the movement's future financial sustainability, the Foundation has aimed to diversify our revenue streams over time. For several years, we have been anticipating a trend where fundraising revenue through banners would no longer represent the majority of our donations. During fiscal year 2023–2024, the Foundation's total revenue was $185.4M, of which $174.7M came from donations. This total number represents not only banner fundraising, but also increased percentages in email and major gift donations. Diversified donation income was complemented by increased investment income, income from the Wikimedia Endowment's cost-sharing agreement, and increased income from Wikimedia Enterprise. Investment income was $5.1M up from $3M in the prior year, primarily due to increased interest income from higher interest rates during the year. The new cost sharing agreement with the Wikimedia Endowment generated $2.1M in revenue to offset costs incurred by the Foundation to support the Endowment (Note: This is in addition to the $2.6 million the Foundation received from the Endowment to support technical innovation projects), and Wikimedia Enterprise brought in gross revenue of $3.4M, up slightly from $3.2M in FY 2022–2023. While diversification fell slightly short of our Annual Plan goals, we believe we are still on track over the medium-term: Enterprise contracts have since increased $400K year over year in monthly revenue so far in FY 2024–2025, and we anticipate more income to be generated from Enterprise in subsequent fiscal years.
→ More about Enterprise's financials and the work to diversify revenue streams is available in the Enterprise financial report. More information about the Endowment detailed below.

You can read the full audit report on the Foundation's website, review the frequently asked questions on Meta-Wiki, or ask any additional questions on the FAQ talk page.

Key takeaways from the Wikimedia Endowment's fiscal year 2023–2024 audit report

The Wikimedia Endowment has completed its audit report covering the fiscal year (FY) 2023–2024, which was the nine month time period from 30 September 2023 – 30 June 2024, from the time that the Endowment began operations as a standalone 501(c)(3) organization on 30 September 2023 through the end of the fiscal year on 30 June 2024. This was the first year that the Wikimedia Endowment completed an independent audit report, as it became a standalone 501(c)(3) during this fiscal year. The Endowment is a permanent fund that generates income for the Wikimedia projects in perpetuity with the aim of protecting Wikimedia projects far into the future. The work was overseen by the Endowment's Audit Committee, led by Chair Kevin Bonebrake. Here are a few key takeaways:

  • Clean audit opinion: The external auditors, KPMG, issued their opinion that the Wikimedia Endowment's financial statements for fiscal year 2023–2024 are presented fairly and in accordance with U.S. GAAP.
  • Revenue from Tides transfer, donations, and investment income: The Endowment's total revenue was $132.0M for fiscal year 2023–2024. However, the vast majority of this revenue came from the transfer of $116.2M of the Endowment fund from the Tides Foundation. Funds for the Endowment were held by the Tides Foundation from 2016–2023. In 2023, the Endowment became its own standalone 501(c)(3). At that point, all of the Endowment funds held by Tides were transitioned over to the new entity in the form of a one-time transfer. The Endowment received $13.4M in new donations during FY 2023-2024 and had $2.4M in investment income.
  • Funding to support Wikimedia projects: The Endowment provided $2.9M in funding in FY 2023–2024 to support technical innovation on the Wikimedia projects: $1.5M for MediaWiki upgrades, $600,000 for Abstract Wikipedia, $500,000 for efforts aimed at reaching new audiences, and $278,375 for Kiwix. More information about this round of Endowment funding can be found here.
  • Strong financial position: As of June 30, 2024, the Endowment's net assets were $144.3 million, made up primarily of cash of $20.1M and investments of $123.4M. These assets have generated $19.7M in returns on investment during FY 2023–2024, of which $6.1M has been used to fund technological innovation of the Wikimedia projects over the past two fiscal years.

You can read the full audit report, review the frequently asked questions on Meta-Wiki, or ask any additional questions on the FAQ talk page.

About the Wikimedia Endowment

Launched in 2016, the Wikimedia Endowment is a nonprofit charitable organization providing a permanent safekeeping fund to support the operations and activities of the Wikimedia projects in perpetuity.  It aims to create a solid financial foundation for the future of the Wikimedia projects. As of June 30, 2024, the Wikimedia Endowment was valued at $144.3 million USD. The Wikimedia Endowment is a U.S.-based 501(c)3 charity (Tax ID: 87-3024488). To learn more, please visit www.wikimediaendowment.org.

Wikipedia:Wikipedia Signpost/2024-11-18/News and notes

Monday, 18 November 2024 00:00 UTC
File:Narcisse Snake Dens 10.jpg
Jucá Costa
cc by-sa 4.0
90
25
500
News and notes

Open letter to WMF about court case breaks one thousand signatures, big arb case declined, U4C begins accepting cases

Arbitration declined in case with much private evidence

The opening statement in a new arbitration case request, titled "Covert canvassing and proxying in the Israel-Arab conflict topic area" read:

There is ongoing coordination of off-wiki editors for the purpose of promoting a pro-Palestinian POV, utilizing a discord group, as well as an EEML-style mailing list (Private Evidence A).
A significant participant in the discord group, as well as the founder of the mailing list (Private Evidence B), is a community banned editor (Private Evidence C), who since being banned has engaged in the harassment and outing of Wikipedia editors (Private Evidence D). This individual has substantial reach (Private Evidence E), and their list appears to have been joined by a substantial number of editors, although I am only confident of the identify of three.
The Discord group was previously public, but has now transitioned to a private form in order to better hide their activities (Private Evidence F). It is not compliant with policy, being used to organize non-ECP editors to make edits within the topic area, some of whom have now become extended-confirmed through these violations. In addition, it is used by the community-banned editor to make edit requests, edit requests that are acted upon (Private Evidence G).

There was much discussion by community members voicing concern of a public posting of wide-reaching allegations. Some of the discussion mitigated or accepted the alleged off-wiki coordination, and some did not. Comments included:

Editor 1: another illustration that there are ugly undercurrents about conflicts involving the editing of articles on the Palestinian-Israeli conflict.
Editor 2: goalpost-moving ARBECR [extended confirmed restriction] enforcement creep... expanding ... into literally doxxing editors
Editor 3: public aspersions based on secret denunciations
Arb 1: Decline this publicity stunt
Arb 2: [The filer] shouldn't have just dumped a pile of private evidence in public. But I also don't see how we get out of dealing with the merits of this issue

At our deadline, five out of 10 active arbitrators had voted to decline the public case, which effectively kills the request according to current procedures. However, at approximately the same time as the consensus to decline this case emerged, arbs opened new motions regarding Palestine-Israel articles, "a case to examine the interaction of specific editors in the WP:PIA topic area ... Evidence from the related private matter, as alluded to in the Covert canvassing and proxying in the Israel-Arab conflict topic area case request, will be examined prior to the start of the case, and resolved separately."

B

A petition in the form of an open letter addressed to the Wikimedia Foundation has been created regarding the ongoing lawsuit in India (see also In the media in this issue). Its signatories are profoundly concerned at the suggestion that the Foundation is considering disclosing identifying private information about volunteer editors to the Delhi High Court.

The most signed petition in Wikimedia history before this was the 2020 Community open letter on renaming, which successfully asked the Wikimedia Foundation to refrain from renaming itself to "Wikipedia". That one reached 1015 signatures after running for months. This petition has crossed 1015 signatures in 10 days, making it the strongest community consensus statement yet.

Separately, a site blackout was proposed, then closed with 2:1 opposition: Wikipedia:Requests for comment/2024 Wikipedia blackout. Some of the voters may have been persuaded by personal comments from Wikipedia's co-founder Jimbo Wales who is privy to board discussions on the case, and said I am personally not worried and think that a protest is unwarranted.B, Br, Q

U4C is accepting cases

The U4C is now accepting cases. See the relevant meta page for more information.

CheckUser and COI VRT appointments

Appointments to the Conflict-of-interest volunteer response team (COI VRT) and CheckUser privilege changes were announced by the Arbitration Committee. Spicy was added as a CheckUser. The COI VRT includes, in addition to CheckUsers and Oversighters, the following administrators: 331dot, Bilby, Extraordinary Writ, Robertsky.

Two administrator recalls, one RRFA

Wikipedia:Administrator_recall/Graham87 and Wikipedia:Administrator recall/Fastily were closed as successful. Re-request for adminship (RRFA) remains an option for all recalled administrators, with lower thresholds than a regular RfA. As of our deadline, Graham87's RRFA is active. – B

Brief notes

  • Reminder to apply for Affcom and Ombuds Comm / Case Review committee. Applications for the Affiliations Committee close on November 18, and applications for the Ombuds commission and the Case Review Committee close on December 2. See meta:Wikimedia Foundation/Legal/Committee appointments for details.
  • New administrators: The Signpost welcomes the English Wikipedia's newest administrators, Voorts and Worm That Turned. Voorts said he had been planning an RfA before the election dates were announced, running the first traditional RfA after the October AELECT trial.
  • Arbitration committee election: Questions may be asked of the candidates at Wikipedia:Arbitration Committee Elections December 2024/Questions. Voting will open for eligible community members at 00:00 19 November. Up to nine vacancies will be filled according to the election results.
  • Articles for Improvement: The Article for Improvement is Diurnality (beginning 25 November). Please be bold in helping improve this article![1]

Footnotes

  1. ^ There was no AfI for the week of 17 November and The Signpost has been unable to determine why.

Wikipedia:Wikipedia Signpost/2024-11-18/In the media

Monday, 18 November 2024 00:00 UTC
File:NSRW Map of Australia (cropped).png
?
PD
0
0
300
In the media

Summons issued for Wikipedia editors by Indian court, "Gaza genocide" RfC close in news, old admin Gwern now big AI guy, and a "spectrum of reluctance" over Australian place names

Asian News International case against Wikimedia and Wikipedia editors

Background: Asian News International vs. Wikimedia Foundation blanked by court order, Litigation involving the Wikimedia Foundation, prior Signpost coverage

Summons issued for Wikipedia editors in ANI case

Commentary and facts involving the case were published by Bar and Bench, India Legal Live (ENC Network), The Hindu, and Hindustan Times. At least one source said that according to a summons issued by Delhi High Court, WMF had released or will release email addresses of three editors, "Defendants 2–4".

According to MediaNama, one of the defendants signed the on-wiki open letter protesting the case (see related Signpost coverage). – B

Should Wikipedia be treated like a publisher?

Aditi Agrawal covers the ANI case for Hindustan Times. The question of Wikipedia's publisher-like status is also addressed in India Today's Fiiber channel on MSN, "Why has the Indian government issued a notice to Wikipedia, explained in 5 points". – B

Bias complaint: the phantom menace / MIB is MIA

As we went to press on our last issue abplive reported that "According to ANI, the government has written to Wikipedia highlighting a number of complaints of bias and inaccuracy. In the letter, the Centre pointed out that a small group of people have editorial control over the website." The "Centre" refers to the central Indian government or specifically the Indian Ministry of Information and Broadcasting (MIB).

The existence of this letter, or the timing of its issue, has itself been called into question. At The Signpost, we could not find a solid report to base a story on.

Some media just said there was "a notice" sent, another said unnamed government sources had spoken to one media outlet, and none we could find provided any real details (example, example). Since then, TechCrunch is also reporting that no complaint has been found by their staff, either. – B

RfC closure noted

This closure of a more than month-long Request for comments (RfC) at List of genocides was noted in several press sources ...

The RfC confirming the page title follows a Requested move talkpage discussion which initially set the title earlier this year – see previous Signpost coverage. – B

Luckey Gaetz Wikipedia

There's a bizarre style of biography that commonly appears off-Wiki in the less-than-reliable press with headlines like John Doe Wiki. This week "GhanaCelebrities" provided the best example I've seen "Ginger Luckey Gaetz Wiki, Age, Career, Husband". The article is so well-written – it doesn't seem to have been authored with either artificial intelligence or natural stupidity – that if provided with references it would take at least a week to delete if it were posted on-Wiki. Luckey Gaetz's main claims to fame – if not notability – are that she has a rich brother and is married to the former congressman and currently nominated U.S. Attorney General Matt Gaetz. Mrs. Gaetz, according to the article, is a KPMG manager who has taken some MBA courses through Harvard's online program and in person at UC Berkeley. Mr. Gaetz's notability includes accusations of drug use and paying for sex with minors.

A completely separate linking of Gaetz with Wikipedia was published as a trivia question in Above the Law. Kathryn Rubino asked "What law school did (Matt) Gaetz attend?" Despite a wealth of official sources that she could have linked to document the answer, she linked to Wikipedia. She told The Signpost that she did so "because Wikipedia is the easiest way to encapsulate multiple facts about a source with a single link. In this instance I wanted a reference that Matt Gaetz went to William & Mary Law as well as the other notable legal figures that went to the law school but never held the position of U.S. Attorney General." – S

Gwern interview: How a longtime Wikipedian became an influential voice in AI — and still remains anonymous

Dwarkesh Patel (a US podcaster who TIME magazine recently described as one of the 100 most influential people in AI) published an interview titled "Gwern Branwen - How an Anonymous Researcher Predicted AI's Trajectory". According to Patel, Gwern has "deeply influenced the people building AGI," and "If you've read his blog, you know he's one of the most interesting polymathic thinkers alive."

User:Gwern is also a longtime Wikipedian with almost 100k edits on English Wikipedia. While the interview mostly focused on AI and Gwern's life as an independent writer, it also discussed the pivotal role that editing Wikipedia had played for him:

Dwarkesh Patel

What is it that you are trying to maximize in your life?

Gwern

I maximize rabbit holes. I love more than anything else, falling into a new rabbit hole. That's what I really look forward to. Like this sudden new idea or area that I had no idea about, where I can suddenly fall into a rabbit hole for a while.
[...]

Dwarkesh Patel

What were you doing with all these rabbit holes before you started blogging? Was there a place where you would compile them?

Gwern

Before I started blogging, I was editing Wikipedia.
That was really gwern.net before gwern.net. Everything I do now with my site, I would have done on English Wikipedia. If you go and read some of the articles I am still very proud of—like the Wikipedia article on Fujiwara no Teika—and you would think pretty quickly to yourself, “Ah yes, Gwern wrote this, didn't he?”

Dwarkesh Patel

Is it fair to say that the training that required to make gwern.net happened on Wikipedia?

Gwern

Yeah. I think so. I have learned far more from editing Wikipedia than I learned from any of my school or college training. Everything I learned about writing I learned by editing Wikipedia. [...] For me it was beneficial to combine rabbit-holing with Wikipedia, because Wikipedia would generally not have many good articles on the thing that I was rabbit-holing on.

It was a very natural progression from the relatively passive experience of rabbit-holing—where you just read everything you can about a topic—to compiling that and synthesizing it on Wikipedia. You go from piecemeal, a little bit here and there, to writing full articles. Once you are able to write good full Wikipedia articles and summarize all your work, now you can go off on your own and pursue entirely different kinds of writing now that you have learned to complete things and get them across the finish line.

However, echoing concerns Gwern had already detailed in a 2009 essay titled In Defense of Inclusionism, he cautioned that

It would be difficult to do that with the current English Wikipedia. It's objectively just a much larger Wikipedia than it was back in like 2004. But not only are there far more articles filled in at this point, the editing community is also much more hostile to content contribution, particularly very detailed, obsessive, rabbit hole-y kind of research projects. They would just delete it or tell you that this is not for original research or that you're not using approved sources.

He also recalled other ways in which Wikipedia was different in its earlier years:

Gwern

I got started on Wikipedia in late middle school or possibly early high school.
It was kind of funny. I started skipping lunch in the cafeteria and just going to the computer lab in the library and alternating between Neopets and Wikipedia. I had Neopets in one tab and my Wikipedia watch lists in the other.

Dwarkesh Patel

Were there other kids in middle school or high school who were into this kind of stuff?

Gwern

No, I think I was the only editor there, except for the occasional jerks who would vandalize Wikipedia. I would know that because I would check the IP to see what edits were coming from the school library IP addresses. Kids being kids thought they would be jerks and vandalize Wikipedia.

For a while it was kind of trendy. Early on, Wikipedia was breaking through to mass awareness and controversy. It’s like the way LLMs are now. A teacher might say, “My student keeps reading Wikipedia and relying on it. How can it be trusted?”

"Gwern Branwen" is a pseudonym. Of interest to Wikipedians who are conscientious about keeping their real name separated from their public editing activity (see also coverage of a current open letter in this issue's News and notes), the interview also discusses benefits of maintaining anonymity. While it was conducted in person, responses were re-recorded by a different person, and for the customary video of the interview, an AI-generated avatar was created as a stand-in.

In other parts of the interview that might likewise resonate with Wikipedians who devote large amounts of unpaid work to their hobby, Patel asked various probing questions about Gwern's personal finances, again starting from his Wikipedia volunteering:

Dwarkesh Patel

When you were an editor on Wikipedia, was that your full-time occupation?

Gwern

It would eat as much time as I let it. I could easily spend 8 hours a day reviewing edits and improving articles while I was rabbit-holing. But otherwise I would just neglect it and only review the most suspicious diffs on articles that I was particularly interested in on my watchlist. I might only spend like 20 minutes a day. It was sort of like going through morning email.

and later

Dwarkesh Patel

How do you sustain yourself while writing full time?

Gwern

Patreon and savings. I have a Patreon which does around $900-$1000/month, and then I cover the rest with my savings. [...] So I try to spend as little as possible to make it last.
I should probably advertise the Patreon more, but I'm too proud to shill it harder.
[...]

I live in the middle of nowhere. I don't travel much, or eat out, or have health insurance, or anything like that. [...] I live like a grad student, but with better ramen. I don't mind it much since I spend all my time reading anyway.

The interview then took a rather consequential turn:

Dwarkesh Patel

It seems like you’ve enjoyed this recent trip to San Francisco [home of several AI labs mentioned earlier in the interview, like OpenAI and Anthropic]? What would it take to get you to move here?

Gwern

Yeah, it is mostly just money stopping me at this point. I probably should bite the bullet and move anyway. But I'm a miser at heart and I hate thinking of how many months of writing runway I'd have to give up for each month in San Francisco.

If someone wanted to give me, I don’t know, $50–100K/year to move to SF and continue writing full-time like I do now, I'd take it in a heartbeat.

Patel then encouraged him to share contact information for potential donors, and two days after the interview' release noted that these had indeed been found and that Gwern would be moving to San Francisco.

H

In brief

Exploding whale coverage in The Signpost



Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit next week's edition in the Newsroom or leave a tip on the suggestions page.


Tech News issue #47, 2024 (November 18, 2024)

Monday, 18 November 2024 00:00 UTC
previous 2024, week 47 (Monday 18 November 2024) next

Tech News: 2024-47

As far as the English Wikipedia is concerned, there is no red nor a blue link for the 2024 awardees of the Brewster medal. Its information ends in 2021. The German Wikipedia is up to date. There are no articles for Renée A. Duckworth and for Juan C. Reboreda on both Wikipedias, the German has two red links.

When you maintain information like this, there are three options. You can include an awardee in text or as a link and as luck will have it the link will turn red or blue. This is complicated because a link may have homonyms. With a red link you will only know an homonym issue once an article is created, with a blue link you may know immediately.

The Wikimedia Foundation solved a similar problem a long time ago for another type of link, the "interwiki link".  The solution is Wikidata. It works because there is only one identifier for every topic and every article needs a link to a Wikidata item to have a more global relevance.

Thanks to the ongoing development of Wikidata, there is the Wikibase. We should do a similar job for the red and blue links. It will do away with the false friends problems in Wikipedia. It will improve quality for each Wikipedia and it will improve the quality of Wikidata. Any data related updates that are not strictly local will remain at Wikidata because that helps us in the sharing of the sum of all knowledge.

When a new a link is to be added in any of the 333+ Wikipedias, it starts with disambiguation.. Is the subject already known in any of the other Wikipedias? If not a new Wikidata item will be created and extend options in any future disambiguation. If it is, available information and references are available from the start and consequently a Scholia, a Reasonator or any other generated view of the information may become available dependent on the policies of a Wikipedia.

Implementing such a Wikibase is not really problematic because all the blue links still refer through the local Wikipedia article to Wikidata. The red links are the more tricky bit. They are opened up once they are linked to a Wikidata item. 

With such a Wikibase in place, we can start doing the smart things. The Brewster medal, Q612041, could have a red or blue link to all the awardees. When they don't the article is to be reported for maintenance..

Cool?

     GerardM

weeklyOSM 747

Sunday, 17 November 2024 11:02 UTC

07/11/2024-13/11/2024

lead picture

Collage of some results Hackweekend Berlin November 2024 [1] | © tordans, Hartmut, Christian, Wolfram | Map data © OpenStreetMap contributors

Mapping

  • Requests for comments have been made on these proposals:
    • amenity=travellers_lounge for mapping public seating areas in transport facilities, such as airport lounges or railway station waiting areas.
    • virtual_tour=* to tag virtual 3D tours of places such as museums, hotels, or shops.
    • rental:powerbank=yes, to map stations where users can rent portable power banks to charge mobile devices on the go.
    • addr:milestone=* to allow the tagging of street addresses that use the distance from a reference point as part of the address.
    • languages:official=* and languages:preferred=*, to enable the specification of languages for name rendering, for example the targeted display of street names in different languages or scripts in map applications.
  • The proposal to delete busway=* for bus lanes was accepted with 21 votes in favour, 0 against, and 0 abstentions.

Mapping campaigns

  • Julien Minet reported on the status of address completion in Wallonia, Belgium, on OpenStreetMap, noting a current coverage of 65.9% compared to the ICAR database, although progress has been slowed by recent adjustments to official address sources. A map shows completion rates per commune, highlighting areas with significant updates, and challenges due to pre-assigned addresses in undeveloped areas.
  • If you ever visit Mexico and want to try the delicious esquites/elotes which are sold mostly on street stalls, there’s a collaborative map, so you will know where to find them.
  • The 50,000th challenge on MapRoulette, created just this week, marked a significant milestone in the platform’s journey to enabling collaborative map improvement around the world.

Community

  • OpenStreetMap Belgium, an independent NGO since 2023, continued to work with key supporters such as TomTom and the Belgian National Crisis Centre, organising events including the first European State of the Map in a decade, and providing updated resources such as free Belgian map tiles twice a year.
  • Pieter Vander Vennet explained how OpenStreetMap users can verify their accounts on Mastodon by linking profiles between the two platforms (we reported earlier).

Events

  • [1] Lars Lingner summarised the OpenStreetMap community’s Hackweekend in Berlin, Germany, where over 20 participants worked on creative projects and technical challenges while engaging in an open exchange on social and cartographic issues.
  • OpenStreetMap Belgium is hosting a mapathon in Bruges on Friday 29 November to support the Lili app, which helps visually impaired users navigate safely by mapping essential infrastructure such as tactile paving and audio-enabled traffic signals in Bruges.
  • On Monday 18 November, during the annual ‘Geography Awareness Week’, HeiGIT, in partnership with Doctors Without Borders, the German Red Cross, and the University of Würzburg, is hosting a mapathon in Heidelberg to produce essential map data for humanitarian aid. The event welcomes participants of all experience levels.
  • Mapping USA 2025, a virtual OpenStreetMap conference, is taking place on 24 and 25 January and will feature two days of talks, workshops, and community-driven discussions to engage mappers and advocates from across the US.
  • Geomob’s recent London, England, event to celebrate OpenStreetMap’s 20th birthday, on 18 September, is now available as a video, with slides and audio, thanks to volunteer Andrew Braye.
  • Calling all creative minds: #SotM2025 needs a logo. Submission is via email. Deadline: Saturday 30 November at 23:59 UTC. More details can be found on the Wiki page.

OSM research

  • A new study presented a dataset of classified building footprints for the US derived from OpenStreetMap data, distinguishing between residential and non-residential buildings. The classification, performed using an unsupervised method based on OSM tags and ancillary geospatial data, has been validated with high accuracy across different regions of the US, indicating its usefulness for urban planning, emergency response, and population studies.

Maps

  • 2hu4u has detailed a straightforward way of creating beautiful time-lapse videos of your mapping progress from historical OSM data, all from the comfort of QGIS and without the arduous process of downloading planet files, generating tiles or running a server. In the finished video you can watch the amazing transition, from a nearly blank canvas to a comprehensively mapped Australian city, over 16 years.
  • The open-source DeFlock project shows the global locations of automated licence plate readers, with over 5,600 identified worldwide, to raise awareness and help people avoid surveillance; it uses OpenStreetMap to document camera directions and create warning signs.
  • Andy Townsend described the development of a ‘rural pedestrian’ vector map for England and Wales, focusing on offline usability. This new vector schema, created with Tilemaker and MapLibre, simplifies the original raster schema by reducing data layers and enhancing feature styling, such as UK/IE road shields and handling of previously missed features such as derelict canal bridges. The vector format reduces map size significantly, making it suitable for offline use.
  • Vector tiles are now available on OSMF hardware (we reported earlier). The usage policy is not yet final, but you can use them now, according to Paul Norman. The map style is accessible via the new domain vector.openstreetmap.org and the tiles are available through a MVT address (https://vector.openstreetmap.org/shortbread_v1/{z}/{x}/{y}.mvt). There is a demonstration of the tiles rendered as a map.

OSM in action

  • pl6025 has created a map of commercial POIs in Loire (France) on uMap, which can be reproduced > with Overpass queries.
  • MetroDreamin’ is an interactive platform where users can design and share custom public transit maps, creating their ideal transit systems while connecting with a community of fellow map enthusiasts.

Open Data

  • Daylight Map Distribution v1.58 is the final release of this open geodata project (we reported earlier), concluding its efforts to provide curated and enhanced OSM data. More details about its closure are available in the official announcement.
  • Frederik Ramm, of Geofabrik, discussed access to historical OpenStreetMap data, explaining the tools and datasets available to researchers, the limitations due to OSM’s growth and historical changes, and offering assistance in extracting specific data.

Software

  • Daniel Schep introduced ‘Ultra v3’ (formerly Overpass Ultra), an enhanced mapping tool that now functions as a MapLibre GL JS IDE. New features include query providers for multiple GIS file formats, auto-sorting of map style directives, bundled icon sprite sheets, fallback glyphs, and an HTML control for adding map titles and custom controls, extending Ultra’s utility for geospatial queries and map customisation.
  • The latest update of Jake Coppinger’s Australian Cycleway Stats project improved the efficiency of data processing by adding features such as exclusive cycle lane detection for safer routes on low-speed roads, parallel processing for Australian and international data, and hard-coded Overpass API endpoints for different regions, improving the reliability and speed of data collection for cycleway infrastructure across Australia.
  • The Every Door project, funded by NLNet’s NGI0 Commons Fund, focuses on building the best OpenStreetMap mobile editor for point of interest and address capture, with planned features such as vector tiles and customisation to improve mapping and interoperability.
  • PinPoi is an app for managing and navigating to Points of Interest (POI) by importing files in various formats (e.g. GPX, GeoJSON, CSV) directly to mobile devices. It supports location-based POI searches and displays results in a list or on a map, integrating with users’ preferred navigation applications.
  • TripGeo is offering Map Snake, an interactive map-based game where players navigate a snake on a map to explore different locations, combining geography with classic game elements.
  • The VeloPlanner project is an interactive map focusing on European cycle routes and points of interest such as campsites, shelters, and historic sites, using OpenStreetMap data. It’s currently a map viewer, but planned updates include a route planner and detailed surface and infrastructure data. The platform uses a robust tech stack including MapLibre, Planetiler, osm2pgsql, and Elixir, hosted on Heroku and processed by a dedicated server.

Programming

  • zabop shared a streamlined workflow for editing OpenStreetMap tags, combining the Overpass API, Python, and MapProxy to efficiently identify, visualise, and edit features, emphasising simplicity and fun in the process.
  • ‘Overture to OSM’ is a Python package designed to translate map data from the Overture schema into OpenStreetMap compatible tags, supporting layers such as places, buildings, and addresses, while ensuring OSM compliance.
  • Gregory Peony shared a markdown-based validation feedback template for OpenStreetMap task managers, designed to streamline responses by including standard feedback, reasons for validation results, tips for revisiting tasks, and links to resources. This template aims to support efficient communication, guide contributors in accessing relevant data, and encourage constructive feedback through organised, reusable comments.
  • HOTOSM’s tech updates for November 2024 highlighted ongoing projects, including the alpha launch of Drone Tasking Manager with OpenDroneMap integration, testing of FastAPI for the Tasking Manager, and development of fAIr 2.0 with YOLOv8 model for building detection, as well as improvements to uMap authentication.

OSM in the media

  • For years, two families in Tannhausen suffered from unwanted through traffic from hikers, cyclists and motorists who used their garden as a shortcut. The reason was a map error on OpenStreetMap incorrectly identifying the private path as a public path. After the family discovered this, Florian Fränzl familiarised himself with the OpenStreetMap system and corrected the access to ‘private’. In addition, they have put up official ‘no passing through’ signs, which now ensure peace and quiet and protection of their privacy.

Other “geo” things

  • Berlin (Germany) has introduced a digital overview of all public car parks, which provides real-time information on availability and parking conditions and is intended to make it easier for citizens to find a parking space.
  • Lund University’s (Sweden) ‘mGPS’ system can identify locations with high precision using unique bacterial samples and offers new applications in medicine, epidemiology, and forensics.
  • In Ukraine, military GNSS spoofing to defend against Russian drones is causing problems for the civilian population, as smartphones switch to the wrong time zones and navigation services display inaccurate location data, leading to confusion and delays.
  • Grab has developed its own hyper-local mapping system across Southeast Asia, using input from drivers equipped with special cameras to overcome the challenges of narrow, unmapped roads and improve navigation accuracy, differentiating itself from Google Maps with its regional focus and real-time updates.
  • Google’s Open Buildings Dataset, now enhanced with AI-powered temporal updates, provides detailed building footprint data across Africa and Southeast Asia, supporting applications in urban planning, disaster response, and environmental research.
  • Last month, the US states of Oklahoma and Texas exchanged 0.54 hectares of territory within a reservoir so that a pipeline could resume sending drinking water to a water works without illegally transporting zebra mussels, an invasive species, across the state border.
  • South Korea has accused North Korea of using GNSS jamming signals to interfere with South Korean ships and aircraft, which represents a considerable security risk. The jamming is part of military actions that affect satellite navigation and has already disrupted civilian infrastructure and transport operations.
  • Transit App has introduced an offline feature to track the location of underground trains using motion detection and vibration patterns, allowing users to predict their location, update ETAs, and receive stop reminders without GPS or internet, while maintaining complete privacy.
  • Phoebe Yu explained, in an amusing video, how the problem of Indian addresses in Google Maps was solved. Spoiler: they include places of interest and shops in the route guidance to provide orientation points.

Upcoming Events

Where What Online When Country
Град Зрењанин Okupljanje u Zrenjaninu 2024-11-17 flag
Hannover OSM-Stammtisch Hannover 2024-11-18 flag
Grenoble Atelier du groupe local de Grenoble 2024-11-18 flag
Internationale GeoWoche – Online Mapathon von DRK, HeiGIT, MSF Deutschland & Österreich 2024-11-18
England OSM UK Online Chat 2024-11-18 flag
Workshop: OSM tagging standards for informal settlements 2024-11-19
Missing Maps London: (Online) Mid-Month Mapathon [eng] 2024-11-19
Lyon Réunion du groupe local de Lyon 2024-11-19 flag
Bonn 182. OSM-Stammtisch Bonn 2024-11-19 flag
City of Edinburgh OSM Edinburgh Social Meet-up 2024-11-19 flag
[Online] Map-py Wednesday 2024-11-20
Karlsruhe Stammtisch Karlsruhe 2024-11-20 flag
València XI Jornadas Anuales de Wikimedia España 2024-11-22 – 2024-11-24 flag
Gent Bewakingscamera’s op de kaart (wandeling) 2024-11-22 flag
Bangalore East OSM Bengaluru Mapping Party 2024-11-23 flag
Lyon Campus du Libre 2024 – Lyon – France 2024-11-23 flag
Gent Bewakingscamera’s op de kaart (wandeling) 2024-11-23 flag
명동 국경없는의사회 2024 글로벌 지오위크 매파톤 2024-11-23 flag
Saint-Étienne Rencontre Saint-Étienne et sud Loire 2024-11-25 flag
San Jose South Bay Map Night 2024-11-27 flag
Berlin OSM-Verkehrswende #64 2024-11-26 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen (online) 2024-11-27 flag
Lübeck 148. OSM-Stammtisch Lübeck und Umgebung 2024-11-28 flag
Sint-Michiels LiLi-app mapathon 2024-11-29 flag
Olomouc SotM CZ+SK 2024 2024-11-29 flag
ঢাকা State of the Map Asia 2024-11-29 – 2024-11-30 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Elizabete, LuxuryCoop, PierZen, Raquel Dezidério Souto, Strubbl, TheSwavu, YoViajo, barefootstache, derFred, mcliquid, tordans.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Eventually consistent plain text accounting

Wednesday, 13 November 2024 02:16 UTC
Spending for October, generated by piping hledger → R
Spending for October, generated by piping hledger → R

Over the past six months, I’ve tracked my money with hledger—a plain text double-entry accounting system written in Haskell. It’s been surprisingly painless.

My previous attempts to pick up real accounting tools floundered. Hosted tools are privacy nightmares, and my stint with GnuCash didn’t last.

But after stumbling on Dmitry Astapov’s “Full-fledged hledger” wiki1, it clicked—eventually consistent accounting. Instead of modeling your money all at once, take it one hacking session at a time.

It should be easy to work towards eventual consistency. […] I should be able to [add financial records] bit by little bit, leaving things half-done, and picking them up later with little (mental) effort.

– Dmitry Astapov, Full-Fledged Hledger

Principles of my system

I’ve cobbled together a system based on these principles:

  • Avoid manual entry – Avoid typing in each transaction. Instead, rely on CSVs from the bank.
  • CSVs as truth – CSVs are the only things that matter. Everything else can be blown away and rebuilt anytime.
  • Embrace version control – Keep everything under version control in Git for easy comparison and safe experimentation.

Learn hledger in five minutes

hledger concepts are heady, but its use is simple. I divide the core concepts into two categories:

  • Stuff hledger cares about:
    • Transactions – how hledger moves money between accounts.
    • Journal files – files full of transactions
  • Stuff I care about:
    • Rules files – how I set up accounts, import CSVs, and move money between accounts.
    • Reports – help me see where my money is going and if I messed up my rules.

Transactions move money between accounts:

2024-01-01 Payday
    income:work      $-100.00
    assets:checking   $100.00

This transaction shows that on Jan 1, 2024, money moved from income:work into assets:checking—Payday.

The sum of each transaction should be $0. Money comes from somewhere, and the same amount goes somewhere else—double-entry accounting. This is powerful technology—it makes mistakes impossible to ignore.

Journal files are text files containing one or more transactions:

2024-01-01 Payday
    income:work              $-100.00
    assets:checking           $100.00
2024-01-02 QUANSHENG UVK5
    assets:checking          $-29.34
    expenses:fun:radio        $29.34

Rules files transform CSVs into journal files via regex matching.

Here’s a CSV from my bank:

Transaction Date,Description,Category,Type,Amount,Memo
09/01/2024,DEPOSIT Paycheck,Payment,Payment,1000.00,
09/04/2024,PizzaPals Pizza,Food & Drink,Sale,-42.31,
09/03/2024,Amazon.com*XXXXXXXXY,Shopping,Sale,-35.56,
09/03/2024,OBSIDIAN.MD,Shopping,Sale,-10.00,
09/02/2024,Amazon web services,Personal,Sale,-17.89,

And here’s a checking.rules to transform that CSV into a journal file so I can use it with hledger:

# checking.rules
# --------------
# Map CSV fields → hledger fields[0]
fields date,description,category,type,amount,memo,_
# `account1`: the account for the whole CSV.[1]
account1    assets:checking
account2    expenses:unknown
skip 1

date-format %m/%d/%Y
currency $

if %type Payment
    account2 income:unknown
if %category Food & Drink
    account2 expenses:food:dining

# [0]: <https://hledger.org/hledger.html#field-names>
# [1]: <https://hledger.org/hledger.html#account-field>

With these two files (checking.rules and 2024-09_checking.csv), I can make the CSV into a journal:

$ > 2024-09_checking.journal \
    hledger print \
    --rules-file checking.rules \
    -f 2024-09_checking.csv
$ head 2024-09_checking.journal
2024-09-01 DEPOSIT Paycheck
    assets:checking        $1000.00
    income:unknown        $-1000.00

2024-09-02 Amazon web services
    assets:checking          $-17.89
    expenses:unknown          $17.89

Reports are interesting ways to view transactions between accounts.

There are registers, balance sheets, and income statements:

$ hledger incomestatement \
    --depth=2 \
    --file=2024-09_bank.journal

Revenues:
               $1000.00 income:unknown
-----------------------
               $1000.00


Expenses:
                 $42.31 expenses:food
                 $63.45 expenses:unknown
-----------------------
                $105.76
-----------------------
Net:            $894.24

At the beginning of September, I spent $105.76 and made $1000, leaving me with $894.24.

But a good chunk is going to the default expense account, expenses:unknown. I can use the hleger aregister to see what those transactions are:

$ hledger areg expenses:unknown \
    --file=2024-09_checking.journal \
    -O csv | \
  csvcut -c description,change | \
  csvlook
| description              | change |
| ------------------------ | ------ |
| OBSIDIAN.MD              |  10.00 |
| Amazon web services      |  17.89 |
| Amazon.com*XXXXXXXXY     |  35.56 |
l

Then, I can add some more rules to my checking.rules:

if OBSIDIAN.MD
    account2 expenses:personal:subscriptions
if Amazon web services
    account2 expenses:personal:web:hosting
if Amazon.com
    account2 expenses:personal:shopping:amazon

Now, I can reprocess my data to get a better picture of my spending:

$ > 2024-09_bank.journal \
    hledger print \
    --rules-file bank.rules \
    -f 2024-09_bank.csv
$ hledger bal expenses \
    --depth=3 \
    --percent \
    -f 2024-09_checking2.journal
              30.0 %  expenses:food:dining
              33.6 %  expenses:personal:shopping
               9.5 %  expenses:personal:subscriptions
              16.9 %  expenses:personal:web
--------------------
             100.0 %

For the Amazon.com purchase, I lumped it into the expenses:personal:shopping account. But I could dig deeper—download my order history from Amazon and categorize that spending.

This is the power of working bit-by-bit—the data guides you to the next, deeper rabbit hole.

Goals and non-goals

Why am I doing this? For years, I maintained a monthly spreadsheet of account balances. I had a balance sheet. But I still had questions.

Spending over six months, generated by piping hledger → gnuplot
Spending over six months, generated by piping hledger → gnuplot

Before diving into accounting software, these were my goals:

  • Granular understanding of my spending – The big one. This is where my monthly spreadsheet fell short. I knew I had money in the bank—I kept my monthly balance sheet. I budgeted up-front the % of my income I was saving. But I had no idea where my other money was going.
  • Data privacy – I’m unwilling to hand the keys to my accounts to YNAB or Mint.
  • Increased value over time – The more time I put in, the more value I want to get out—this is what you get from professional tools built for nerds. While I wished for low-effort setup, I wanted the tool to be able to grow to more uses over time.

Non-goals—these are the parts I never cared about:

  • Investment tracking – For now, I left this out of scope. Between monthly balances in my spreadsheet and online investing tools’ ability to drill down, I was fine.2
  • Taxes – Folks smarter than me help me understand my yearly taxes.3
  • Shared system – I may want to share reports from this system, but no one will have to work in it except me.
  • Cash – Cash transactions are unimportant to me. I withdraw money from the ATM sometimes. It evaporates.

hledger can track all these things. My setup is flexible enough to support them someday. But that’s unimportant to me right now.

Monthly maintenance

I spend about an hour a month checking in on my money Which frees me to spend time making fancy charts—an activity I perversely enjoy.

Income vs. Expense, generated by piping hledger → gnuplot
Income vs. Expense, generated by piping hledger → gnuplot

Here’s my setup:

$ tree ~/Documents/ledger
.
├── export
│   ├── 2024-balance-sheet.txt
│   └── 2024-income-statement.txt
├── import
│   ├── in
│   │   ├── amazon
│   │   │   └── order-history.csv
│   │   ├── credit
│   │   │   ├── 2024-01-01_2024-02-01.csv
│   │   │   ├── ...
│   │   │   └── 2024-10-01_2024-11-01.csv
│   │   └── debit
│   │       ├── 2024-01-01_2024-02-01.csv
│   │       ├── ...
│   │       └── 2024-10-01_2024-11-01.csv
│   └── journal
│       ├── amazon
│       │   └── order-history.journal
│       ├── credit
│       │   ├── 2024-01-01_2024-02-01.journal
│       │   ├── ...
│       │   └── 2024-10-01_2024-11-01.journal
│       └── debit
│           ├── 2024-01-01_2024-02-01.journal
│           ├── ...
│           └── 2024-10-01_2024-11-01.journal
├── rules
│   ├── amazon
│   │   └── journal.rules
│   ├── credit
│   │   └── journal.rules
│   ├── debit
│   │   └── journal.rules
│   └── common.rules
├── 2024.journal
├── Makefile
└── README

Process:

  1. Import – download a CSV for the month from each account and plop it into import/in/<account>/<dates>.csv
  2. Make – run make
  3. Squint – Look at git diff; if it looks good, git add . && git commit -m "💸" otherwise review hledger areg to see details.

The Makefile generates everything under import/journal:

  • journal files from my CSVs using their corresponding rules.
  • reports in the export folder

I include all the journal files in the 2024.journal with the line: include ./import/journal/*/*.journal

Here’s the Makefile:

SHELL := /bin/bash
RAW_CSV = $(wildcard import/in/**/*.csv)
JOURNALS = $(foreach file,$(RAW_CSV),$(subst /in/,/journal/,$(patsubst %.csv,%.journal,$(file))))

.PHONY: all
all: $(JOURNALS)
    hledger is -f 2024.journal > export/2024-income-statement.txt
    hledger bs -f 2024.journal > export/2024-balance-sheet.txt

.PHONY clean
clean:
        rm -rf import/journal/**/*.journal

import/journal/%.journal: import/in/%.csv
    @echo "Processing csv $< to $@"
    @echo "---"
    @mkdir -p $(shell dirname $@)
    @hledger print --rules-file rules/$(shell basename $$(dirname $<))/journal.rules -f "$<" > "$@"

If I find anything amiss (e.g., if my balances are different than what the bank tells me), I look at hleger areg. I may tweak my rules or my CSVs and then I run make clean && make and try again.

Simple, plain text accounting made simple.

And if I ever want to dig deeper, hledger’s docs have more to teach. But for now, the balance of effort vs. reward is perfect.


  1. while reading a blog post from Jonathan Dowland↩︎

  2. Note, this is covered by full-fledged hledger – Investements↩︎

  3. Also covered in full-fledged hledger – Tax returns↩︎

Wikipedia knew in a text about a fellow of the Royal Zoological Society of New South Wales. Unlike many other awards it does not have its own article, there is no category for these fellows, it has a paragraph in the article about the fellows.

Wikidata did not know the award. 

The list of fellows on the RZS website is formatted in a "last name, first name" format. There are too many fellows so converting it by hand is inconvenient. As so many people are enamoured by ChatGPT, I gave it a spin. ChatGPT does NOT process websites for me. So I copy pasted the list and asked it to change the order of the surname and the first name. 

I asked it who had a Wikipedia article. It could not tell me but it gave me a list of fellows who likely have a Wikipedia article. For many of them I added the award in Wikidata and for some fellows  I added a new Wikidata item. For many of them I linked publications and this results in a nice Scholia for the award

It would be really cool when there is a Wikimedia AI that will answer questions like: "for the people in this list change the order of the name and check if these Australian award winners have a Wikipedia article or a Wikidata item". Maybe start with a tool for editors and then open it up to the general public. 

Given that Wikipedia is multilingual, what would be the effect of the data for the answers being all Wikipedias AND Wikidata.. Given that Wikifunctions is language agnostic, why not have functions that are a front end to such a Wikimedia AI?

Thanks,

       GerardM

Wiki Education welcomes Richard Gingras to Advisory Board

Tuesday, 12 November 2024 17:00 UTC

Wiki Education is pleased to announce the appointment of Richard Gingras, a long-time executive at Google focusing on news, to our Advisory Board. Gingras steps into his role with extensive experience in digital media, deep engagement in the evolution of internet policy relating to the open Internet and a free press, and a strong commitment to Wiki Education’s mission.

“There are few things more foundational than building a society’s communal knowledge,” said Gingras. “Wiki Education’s effort to evangelize and develop Wikipedia authorship are critical to achieving that objective.”

Richard Gingras headshot
Richard Gingras. Image courtesy Richard Gingras, all rights reserved.

Throughout his 50-year career, Gingras has focused on the advance of news and information systems in an evolving digital society – from the evolution of search engines to enabling the next generation global news ecosystem. 

“Richard’s innovative spirit and deep knowledge about the web will bring invaluable insights to our work,” said Frank Schulenburg, Executive Director of Wiki Education. “His contributions will help us advance our goals and strengthen Wiki Education’s impact in the ever-evolving digital information landscape.“

For many years, Gingras served as the Global Vice President for News. In his current role, Gingras provides strategic guidance on how Google presents news to its users as well as advising Google’s efforts to enable a healthy, open ecosystem for quality journalism, including various programs to enable journalists and news providers to be effective and sustainable in our digital world. 

Gingras co-founded the Center for News, Technology, and Innovation, an independent global policy research center, that seeks to encourage independent, sustainable media, and foster informed public policy conversations to maintain a free press and an open internet. Gingras also served as a member of the Knight Commission o n Trust, Media, and Democracy, and helped found the Trust Project.

His broad experience with digital ventures includes leading Salon.com, as well as positions at Apple, the @Home Network, and the Excite search engine. Gingras also serves on the boards of the First Amendment Coalition, the International Center for Journalists, the International Consortium of Investigative Journalists, the UC Berkeley School of Journalism and PRX, the public media podcast network.

 

Goodbye ASN

Tuesday, 12 November 2024 00:00 UTC

The shitpost ASN is (soon to be) no more

Tech News issue #46, 2024 (November 11, 2024)

Monday, 11 November 2024 00:00 UTC
previous 2024, week 46 (Monday 11 November 2024) next

Tech News: 2024-46

This Month in GLAM: October 2024

Sunday, 10 November 2024 15:21 UTC

weeklyOSM 746

Sunday, 10 November 2024 10:38 UTC

31/10/2024-06/11/2024

lead picture

Participants at SotM Nigeria 2024 [1] | © Calvin Menttor

Mapping

  • Requests for comments have been made on these proposals:
    • virtual_tour=* to link to 3D virtual tours of locations such as museums, hotels, or stores.
    • rental:powerbank=yes for mapping stations that allow users to rent portable power banks for charging mobile devices on the go.
  • Voting is underway until Friday 15 November on the proposal to deprecate busway=* for bus lanes.
  • The vote on the proposal to add the tag sac_scale=strolling, which indicates a wide and smooth path, was accepted with 28 votes in favour, five against, and two abstentions.

Community

  • [1] Calvin Menttor blogged, in his OSM user diary, about his participation in the State of the Map Nigeria 2024. Kingsley Amankwe also published about his experience in the event and both were very pleased with the opportunity to participate.
  • Alexey Zakharenkov discussed an algorithm designed to detect missing stations on public transport routes in OpenStreetMap, overcoming problems such as accidental station deletions by comparing pairs of route variant ‘twins’ for discrepancies using the Wagner-Fischer algorithm, which detects mismatches similar to text-editing errors.
  • jjKlaus asked, on the OSM Community forum, how to handle the ongoing border disputes between Guyana and Suriname and between French Guiana and Suriname in South America.
  • SColchester shared insights from the 10th annual Missing Maps meeting in Nairobi, Kenya, where members discussed the project’s growth, challenges with their website, strategies for using OpenStreetMap data in humanitarian work, and plans to relaunch a data quality internship to promote community engagement and skill-building.
  • Some Amateur described the use of OpenStreetMap data during a firefighting operation, highlighting the critical role that detailed land-use mapping (such as ponds and wetlands) plays in real-world navigation, especially for emergency responders and outdoor enthusiasts who rely on offline, terrain-based guidance.
  • Christopher Beddow explored the evolution, cultural significance, and changing nature of place names (toponyms) throughout history, from colonial renaming and local vernacular names to digital innovations that may erase or preserve traditional toponyms. He highlighted examples such as Aotearoa in New Zealand, Romansh names in Switzerland, and the mixing of indigenous and colonial names in North America, noting how past naming conventions reflect local culture and geography.

OpenStreetMap Foundation

  • The first meeting of the new OSMF board will be on Thursday 14 November at 12:00 UTC in the video room. The meeting agenda is on the OSMF wiki.The topics to be covered are:
    • Voting for Chairperson
    • Voting for Treasurer
    • Voting for Secretary
    • Voting for non-officer roles
    • Administrative
    • Any other business
    • Guest comments or questions.
  • The Local Chapters and Communities Working Group of the OpenStreetMap Foundation is looking for volunteers to help refine its membership model to better accommodate informal user groups and thematic communities by researching membership practices, coordinating discussions, and drafting recommendations.

Events

  • The Geospatial devroom is returning to FOSDEM 2025 and is seeking proposals for talks on topics such as FOSSGIS software, geospatial visualisation, geospatial AI, IoT, and OpenStreetMap. The devroom, scheduled for Saturday 1 February in Brussels, is looking for 25-minute presentations and 10-minute lightning talks. Submissions are due by Sunday 1 December and the selected speakers will be notified by Sunday 15 December.
  • The results of the Scientific seminar in support of the Rio Grande do Sul (Brazil), organised by the IVIDES.org, are now available for free download (in Portuguese). The presentation and video files can be found > on Wikimedia Commons. There is also a mapping project on the HOT Tasking Manager, where you can collaborate.
  • The recordings and photography from the State of the Map Nigeria 2024 have been published. You can watch the videos and see the amazing photos of this event, which was held in the City of Ile-Ife.
  • The community poll for FOSSGIS 2025 is now open and can be taken until Sunday 17 November. The results will be included in the programme committee’s evaluation as a community opinion poll.
  • Campus du libre will be held at the Université Jean Moulin Lyon 3, campus de la Manufacture des Tabacs (Campus of the Tobacco Factory), on Saturday 23 November.
  • The OSM Kerala Community Meetup 2024 is now the State of the Map Kerala 2024, the first regional OpenStreetMap conference to be held in India. The event will be on 16–17 November in Wayanad, Kerala and will focus on open mapping for disaster resilience, GIS, and humanitarian needs. Sponsored by partners including Meta and FOSS United, the event invites mapping enthusiasts to participate in talks, workshops, and networking activities. The sessions will be covering disaster preparedness, community resilience, and open data initiatives, among other topics.
  • State of the Map’s official Mastodon account tooted on several upcoming local, regional, and national State of the Map conferences, as well as other map-related events around the world. They suggested that event organisers who are planning a local State of the Map event apply for a SotM Quick Licence from the OpenStreetMap Foundation.

OSM research

  • Students at Malmö University (Sweden) have investigated the effectiveness of using OpenStreetMap data versus traditional cadastral data for predicting public violent crime in three Swedish cities of different sizes and found that while cadastral data generally produces more accurate predictions in smaller cities, OSM data produces comparable results in larger cities, suggesting the potential utility of OSM for spatial crime analysis.

Humanitarian OSM

  • The Spanish OpenStreetMap community has organised a mapping response to the tragic flooding that occurred on 29 October in the Valencia Province.
  • The Indonesian OpenStreetMap community has organised a mapping response to the Mount Lewotobi Laki-Laki eruption in East Flores, East Nusa Tenggara, Indonesia, which occurred early on Monday 4 November at 02:48 local time, resulting in at least nine fatalities. Many victims were killed by large rocks from the eruption that broke through house roofs. The eruption also triggered intense fires in homes and a school building, causing extensive damage.

Maps

  • Christoph Hormann discussed updates to his AC Style map symbols, including redesigned waste and recycling symbols, new symbols for towers, masts, cranes, benches, and oil wells, with an emphasis on readability, functionality, and cultural relevance for global map users.

OSM in action

  • The Louvre is using an OpenStreetMap-based map on its website to visually display the locations of its loaned artworks around France and internationally, improving public access to its collection through geographic mapping and cultural outreach.
  • Altilunium has developed Cmaxz, an OpenStreetMap-based weather radar map viewer.
  • PickYourPlace announced the launch of its real estate platform designed to provide comprehensive neighbourhood insights, such as safety, property values and accessibility data, to help users make informed housing decisions, initially focusing on Calgary, Canada, with plans for expansion.
  • OldMapsOnline is offering a vast, interactive archive of over 500,000 historical maps, allowing users to overlay these maps with modern OpenStreetMap data for comparison. The platform features a timeline slider to explore specific time periods and is valuable for historians, genealogists, and urban planners interested in studying geographic change and urban development over time.
  • Sylvain Machefert tooted that the Bordeaux-Nantes Intercités trains use dynamic displays with an OpenStreetMap-based map and Wikimedia Commons photos, although he noted the lack of Creative Commons licence attribution for the images.

Open Data

  • OpenSuperMaps has extended its data coverage to New Zealand, allowing users to download enhanced map files with detailed address and geographic data for use in mapping applications such as OsmAnd.

Licenses

  • KevinOs reflected on his effort to align local open data licences with Canada’s Open Government License, by comparing regional variations, noting minor differences in wording and attribution requirements tailored to specific legal frameworks. He highlighted inconsistencies in documentation across Alberta cities and advocated for clearer attribution guidelines and wiki updates by OpenStreetMap’s legal working group to ensure compliance and transparency.

Software

  • The Prototype Fund, a key resource for supporting OpenStreetMap and other open source projects, confirmed its continuation until at least 2029, with one funding round per year. New updates include increased team funding of up to GBP 78,800 (USD 102,000), eligibility for EU-based team members, and a shift in funding priorities to support data security and software components projects and less on civic tech.
  • Jaller94’s tool provides downloadable GPX files that identify missing Stolpersteine (Holocaust memorial stones, we have covered previously) in the neighbourhoods of Berlin, Germany, and allows users to upload these files to OpenStreetMap applications such as OsmAnd to complete local data for memorial locations.
  • Routing24 is a free client-side route optimisation tool designed for small businesses and solo drivers, enabling efficient multi-stop planning with support for multiple vehicles and business rules such as delivery windows, all based on C++/WASM, and OpenStreetMap data. It does not currently support mobile devices.
  • OpenAI has announced that ChatGPT now includes Mapbox for map views, enhancing ChatGPT’s ability to provide geographic context and navigation details directly in conversations. The Mapbox integration allows users to access detailed, interactive maps and supports more intuitive location-based responses for ChatGPT’s new search feature.

Programming

  • For the Smart Mobility Lab Hackathon in Kigali, the Trufi Association has challenged students to create OSM and AI-driven solutions to improve transportation. Their creativity will shape urban mobility in Rwanda.
  • Siarhei Ivanou has published the source code for his service City Indexes , which calculates housing attractiveness indexe for cities in Belarus, Russia, and some European cities.

Releases

  • Peter announced the release of GraphHopper Routing Engine version 10.0. This release fixed a long-standing issue with u-turn routing, added several improvements to turn restriction implementations, and included an enhanced version of GraphHopper Maps with text search functionality.
  • OpenStreetBrowser’s maxspeed category now shows speed limit signs from zoom level 16, with icons that adjust directionally to reflect the actual orientation of the traffic sign.
  • The Vespucci 20.2.0.1 beta release has introduced enhancements including support for HEIC/HEIF images on Android 10+, multi-select in tag-only editing mode, conflict resolution for partial uploads, and auto-selection of Overpass query results, along with various UI, rendering, and code improvements.
  • Martin Raifer announced several updates to overpass turbo, including an auto-styler wizard and additional MapCSS features.

Did you know …

  • … that osm2streets can convert OpenStreetMap data into detailed, lane-specific street networks, making it easier for developers to render and analyse complex road structures?
  • … that Navigator is a popular free multi-platform satnav app powered by Mapfactor, based on maps from the OpenStreetMap project, which can be used in your own applications?

OSM in the media

  • Anne-Karoline Distel corrected a speed limit discrepancy on Kilkenny’s Bennettsbridge Road (Ireland) using OpenStreetMap, Mapillary, and field data collection. She updated the map to reflect accurate speed signs and thereby promoted OpenStreetMap as a reliable tool for local navigation updates.

Other “geo” things

  • The Garmin Tread 2 navigator integrates OpenStreetMap data for turn-by-turn navigation on unpaved roads, plus US Forest Service maps, and trails for off-road adventures. In combination with high-resolution satellite imagery, public land boundaries, and extensive topographic information over North America, it provides versatile guidance for rugged outdoor exploration including snowmobiling.
  • Google is rolling out generative AI-powered updates to Google Maps, Google Earth, and Waze using its Gemini model, enhancing features such as conversational search, contextual navigation, hazard reporting, and AI-based recommendations. The new features will be initially available to selected testers.
  • Esri’s 3D base maps, augmented with data from Overture Maps and OpenStreetMap, is providing detailed globally accessible geographic representations that support urban planning, environmental management, and infrastructure development. These 3D maps incorporate highly detailed building data and are continually updated with open-source and community contributions, enabling Esri to provide immersive visualisations to users worldwide, including targeted support for underrepresented regions such as Africa.

Upcoming Events

Where What Online When Country
Hobart FOSS4G SotM Oceania 2024 2024-11-05 – 2024-11-08 flag
Gent OpenStreetMap + pizza + velopark – meetup 2024-11-08 flag
Salzburg Maker Faire Salzburg 2024 2024-11-09 flag
København OSMmapperCPH 2024-11-10 flag
Zürich 169. OSM-Stammtisch Zürich 2024-11-11 flag
中正區 OpenStreetMap x Wikidata Taipei #70 2024-11-11 flag
Budapest OSM Találkozó 2024-11-12 2024-11-12 flag
Hamburg Hamburger Mappertreffen 2024-11-12 flag
San Jose South Bay Map Night 2024-11-13 flag
Vitoria-Gasteiz XV edición de las Jornadas Ibéricas de Infraestructuras de Datos Espaciales (JIIDE) 2024-11-13 – 2024-11-15 flag
Salt Lake City OSM Utah Monthly Map Night 2024-11-14 flag
München Münchner OSM-Treffen 2024-11-13 flag
Lorain County OpenStreetMap Midwest Meetup 2024-11-14 flag
OSM US Pre-Geoweek Mappy Hour 2024-11-14
Berlin 197. Berlin-Brandenburg OpenStreetMap Stammtisch 2024-11-14 flag
Bochum Bochumer OSM Treffen 2024-11-14 flag
Saint-Claude Conférence Apéro OpenStreetMap 2024-11-15 flag
Vythiri State of the Map Kerala 2024 2024-11-15 – 2024-11-16 flag
Saint-Claude Cartographie collaborative pour tous 2024-11-16 flag
Град Зрењанин Okupljanje u Zrenjaninu 2024-11-17 flag
Hannover OSM-Stammtisch Hannover 2024-11-18 flag
Internationale GeoWoche – Online Mapathon von DRK, HeiGIT, MSF Deutschland & Österreich 2024-11-18
Missing Maps London: (Online) Mid-Month Mapathon [eng] 2024-11-19
Lyon Réunion du groupe local de Lyon 2024-11-19 flag
Bonn 182. OSM-Stammtisch Bonn 2024-11-19 flag
[Online] Map-py Wednesday 2024-11-20
Karlsruhe Stammtisch Karlsruhe 2024-11-20 flag
València XI Jornadas Anuales de Wikimedia España 2024-11-22 – 2024-11-24 flag
Gent Bewakingscamera’s op de kaart (wandeling) 2024-11-22 flag
Lyon Campus du Libre 2024 – Lyon – France 2024-11-23 flag
Gent Bewakingscamera’s op de kaart (wandeling) 2024-11-23 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by MatthiasMatthias, Raquel Dezidério Souto, Strubbl, Ted Johnson, TheSwavu, TrickyFoxy, barefootstache, derFred, mcliquid.
We welcome link suggestions for the next issue via this form and look forward to your contributions.