en.osm.town is one of the many independent Mastodon servers you can use to participate in the fediverse.
An independent, community of OpenStreetMap people on the Fediverse/Mastodon. Funding graciously provided by the OpenStreetMap Foundation.

Server stats:

251
active users

#dataset

0 posts0 participants0 posts today

We imported the data from Black Basta Ransomware group leak into AIL and there are many interesting aspects.

  • The federation network of Matrix servers (see the screenshot) used to communicated among the affiliates/group(s).

  • Activities in the chat room, especially the daily activity view in AIL. Guessing the location and timezone of groups or affiliates is an endless source of information.

  • They rely on many open-source and SaaS tools, including Google Docs or Zoom.

  • Many interesting correlations with cryptocurrencies, IP addresses, CVE numbers, and chat username relationships (who talks to whom and when).

If you are using AIL project and want to import the leak dataset, @terrtia did an importer github.com/ail-project/ail-fee

#BlackBasta #blackbastleaks #threatintel #osint #threatintelligence #opensource #dataset

@ail_project

Maybe some interesting input for @fr0gger for his existing analysis.

I see that this dataset can be used to enhance some of our open-source tools.

github.com/ail-project/ail-fra

Continued thread

OK ended up using OSF as I have quite a few things on tonight's checklist to get on with and tbh I don't think I really understand enough to understand any potential advice!

I have set up a data repository here. Would be lovely if someone, anyone out there could check that both the excel file and the csv can be viewed/downloaded!

osf.io/bjfhs/

OSFExploring the Black Atlantic This spreadsheet contains data points used in the the Future Legacies 'Black Atlantic' storymap. The storymap aims to educate readers about the concept of the Black Atlantic through items from different GLAM collections and Digital Humanities projects. Hosted on the Open Science Framework

This dude shijith.com/ Has amazing #dataviz and #dataset projects using public data from india. github.com/shijithpk/wikipedia This is a script that collects the most abused wikipedia pages. Here's one that looked at the panchayath's in kerala that have heated up the most in the last five years. github.com/shijithpk/hottest-p he shares code and does good writeups on the blog. good stuff! #DataVizIndia

shijith.comShijith.com

Elevation-Derived Hydrography [EDH] - The USGS’s Rich New Hydrological Features Dataset
--
doi.org/10.2489/jswc.2024.0314 <-- shared paper
--
pubs.usgs.gov/publication/tm11 <-- USGS EDH Representation, Extraction, Attribution, and Delineation Rules reference publication
--
usgs.gov/3d-hydrography-progra <-- shared link to the USGS 3DHP page
--
[in my role, I have the pleasure of working with the valuable EDH process(es) and the data it produces on a daily basis]
#GIS #spatial #mapping #water #hydrology #hydrography #3dep #edh #3dhp #elevationderivedhydrography #opendata #elevation #dem #dtm #interpretation #waterfeatures #usecase #waterresources #floodmodeling #alignment #model #modeling #dataset #naturalresources #costs #benefits #economics #businessuse #publicdata #spatialanalysis #USA #USGS
@USGS

The "Korean Journal of Digital Humanities" (under the aegis of #KADH) has just published a new issue! accesson.kr/kjdh/v.1/2/2024

Wide range of topics, including a short story #dataset, an analysis of #Fitzgerald, #newspaper, #networks, #morphology...

... and, intriguingly and curiously, an interview with myself.

accesson.krKorean Journal of Digital Humanities, Korean Association for Digital HumanitiesKorean Journal of Digital Humanities, Korean Association for Digital Humanities

Have you heard about Common Corpus?

It's the largest open and permissible licensed text #dataset, comprising over 2 trillion (!) tokens.

CC is a diverse dataset that features books, newspaper and scientific articles, government and legal docs, code, and other assets.

👇
huggingface.co/datasets/PleIAs

huggingface.coPleIAs/common_corpus · Datasets at Hugging FaceWe’re on a journey to advance and democratize artificial intelligence through open source and open science.

The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:

➡️ There's more data collected for #Catalan (ca) than for #English (en) - testament to the independence and language reclamation efforts in Catalunya. Language and cultural transmission are deeply intertwined.

➡️ Some of the newer #languages to Common Voice, like #Ligurian / #Genoese (lij) have contributions from mostly older speakers, which is unusual in comparison to the rest of the dataset. This may reflect the population that currently speak those languages - as many regional languages in Italy are in rapid decline.

➡️ Some languages such as Eastern Mari / Meadow Mari (mhr) - a #Uralic language spoken in the Mari-El Republic within Russia - have samples from predominantly female-identifying speakers, again contrasting to the rest of the dataset. Other languages here include #Cantonese (yue), #Georgian (ka), and #Kalenjin (kln).

➡️ A key part in the preparation of the Common Voice dataset is the validation of utterances to assure they match their written transcription - which requires at least two validations by separate speakers. Some newer languages to Common Voice, such as Erzya (myv) and Moksha (mdf), both Uralic languages, have nearly 100% validation.

What are your interpretations of the dataset?

observablehq.com/@kathyreid/mo

How a stubborn #computerscientist accidentally launched the #deeplearning boom
"You’ve taken this idea way too far," a mentor told Prof. Fei-Fei Li, who was creating a new image #dataset that would be far larger than any that had come before: 14 million images, each labeled with one of nearly 22,000 categories. Then in 2012, a team from Univ of Toronto trained a #neura network on #ImageNet, achieving unprecedented performance in image recognition, dubbed #AlexNet.
arstechnica.com/ai/2024/11/how #AI

Ars Technica · How a stubborn computer scientist accidentally launched the deep learning boomBy Timothy B. Lee

Hey #DataScience people!

I am about to start my first “Introduction to Data Science” course at #University, and our professor asked us to team up and think about a project that we want to do.

Nevertheless, since I don’t know anything about the topic yet, I would really appreciate any tips of entry-level data science projects that I could do with #OpenData #DataSets in #Python!

Probably, we will be using #pandas. Since you’re here, any additional learning resources or general suggestions are much welcome, too!

Thanks ❤️👾

(Not sure how useful it is, but this is the course link: ois2.tlu.ee/tluois/subject/ULP)

ois2.tlu.eeTLÜ ÕIS