Portugues#2150
Conversation
No processo de incluir ox parox e proparoxitonas
Helper file ox-paros-proparox that classifies the words based on the stress
Adding derive default to the language enum
It invokes different dictionaries depending on what language you pass to it
Refactored a bit of the main function to add global arguments for language and dialect. Not yet decided how to deal with dialects for other languages than English
Added both masculine and feminine genders. Also added a neutral gender so that expansions to other languages is made easier.
Now the replace can have a condition (for example, checks if it's a noun or a verb before replacing)
Now Dialect is a trait that can be used to implement other Dialects. The old Dialect struct is now called EnglishDialect, and has the same functionalities as before. The only caveat is that now one must include also the traits to be able to use the functions
Put the new properties in the wrong place. It was the }. Just a misplaced }.
Documentation had the wrong use of the Dialog trait
Now DictWordMetadata takes a DialectFlagsEnum instead of a EnglishDialect. There are some adaptations that needed to be made, for example creating the Dialect trait, and adding to this trait a function that gets most used dialects from document using a provided language (LanguageFamily, which is a variant of the enum that doesn't have the dialect contained inside it). This function should not be used for any of the specific DialectFlags implementations, only for the DialectFlagsEnum. On the counterpart, the variant of the function that doesn't receive any languages should only be used by the specific implementations, which should be progressively more private, being replaced in general by the enum. By the way, the original idea was using in the DictWordMetadata struct a Box<dyn DialogFlag>, but in the end the trait is not dyn compatible, hence this workaround
Adapting what needs to be adapted from Language to LanguageFamily
Basically EnglishDialect::American => Language::English(EnglishDialect::American) everywhere that uses LintGroup::new_curated
Thanks for the answer! Don't worry at all.
This is really great to hear, I hope the project gets a lot of traction when we release the structure for other languages. Whenever I have something a bit more solid I'll get in contact with the more active contributors to do some planning. |
Fiz funcionar o teste. Trabalhando no conjugador de verbos agora
Most test pass, failed on the Dialect stuff. Probably the BitOrAssign.
|
In case you are not aware, there is now also #3402 as an attempt at adding German support. |
|
Thanks for bringing my attention to it! I took a look at the code, and it's not terrible but it's terribly disorganized. There are some okay things there, but I'll once again ask if it is possible for you or @elijah-potter to check the code I wrote, so that we can merge it and other efforts for other languages are more structured. What I saw there, and I don't want to come off as mean, but it was completely unsustainable, and some structural choices assumed clearly that German would be the only other language to be integrated into Harper. Of course my code is also not perfect, but at least for most of the languages using the Latin alphabet it is enough and expandable (of course, that's as far as I can tell. I only speak Portuguese, English, interm. German and bit of Spanish and Italian). The other initiatives of expanding harper are typically very specific and local, and I imagine a whole lot of effort (or tokens) is being wasted this way. I am currently in the middle of writing my bachelor thesis, so I can't devote that much attention to this project (regardless of how much I like it and need it), but if my fork is greenlit structure wise I can spend one or two afternoons ironing out whatever bugs are left (there is one right now annoying the fuck out of me) and merging the last few commits so that it can be successfully merged into harper master. Thanks again, |
Unfortunately I just don't know too many parts of the codebase it touches to have confidence reviewing it. I might try to have a look anyway later today.
That's the kind of thing I foresee in any attempt to tackle this. My attempt to avoid it when I tried a Spanish one ages ago stalled because I tried to take everything into account and I got bogged down, especially in all the parts of the codebase where I didn't know how too many things worked.
Yes we started to get a little influx of AI code. Of much more wildly varying quality than I expected. Some worse than I thought was possible, and some better than what I've managed so far.
Understood! Thanks for the detailed reply! |
|
I'm aware of PR #2150. The Portuguese PR takes a different approach that makes breaking changes to core types like `Dialect` (changing it from an enum to a trait) and modifies `FstDictionary::curated()` to require a `LanguageFamily` parameter. This breaks compilation of the CLI and requires refactoring throughout the codebase. Our approach incorporates some of the Portuguese PR's functionality but maintains backward compatibility by adding new functionality without modifying existing types, using a modular `language/` directory structure. It compiles, works today with German, and demonstrates extensibility to Portuguese and other languages without breaking existing code. |
|
My friend, @Dronakurl, the code you wrote added German basically as another dialect. Even your answer to this thread is AI generated. You made no separate annotations file, no separation of language from what I can see (besides the language directory, but I'm talking code wise), and you didn't even mention the Language enum, which makes me think you didn't really look at my code. My changes are breaking changes, that you got right. Yours, however, are not appropriate for merging. It is too simplistic an approach for a problem that is not as simple. Thankfully, as stated somewhere else, very few other crates use harper directly, instead opting for harper-ls, which has all it's functionality preserved with both our codes (or at least I think so, I didn't look at this part in your code). Finally, "our approach"? Who's we? Only you contributed to the repository, are you referencing claude here? Or is it referencing you? I apologize for any aggression, but I don't really like when people even answer messages with AI. |
|
I apologize, I was carried away and was overly enthusiastic about AI (I read the message before posting it, but you are right about calling me out). You are right about the Dialect approach. It works (at least for German, I am using it right now within helix), but it is ugly. I will try to come back with an approach which follows the Language enum approach. |
|
@Dronakurl no, I apologize. I was a bit too aggressive. Nevertheless, I saw some of your code and some parts of it would be good additions to my own code, especially the weir rules for German. As @hippietrail suggested, I think we could join efforts, if you would like to. I tried making my code so that extensions to new languages could be done, I don't know how easily though. If you'd like to take a look and try your hand at it, I would love to get some pull requests to it. This way we could both move forwards instead of having competing implementations. If you have any doubts about how some things work in my part of the code, be sure to send me a message and I'll answer as quickly as possible. Looking forward to working with you! |
I wouldn't worry about breaking changes for something this big. We do breaking changes for major version bumps, as happened recently. I can appreciate both extending the meaning of
I actually also tend to use "us"/"our" when I've used an agent to an extent where I can't personally vouch for 100% of the code. So from that angle it's a bit more honest.
We might have to think about some way of indicating when comments are from AIs. I try to engage them and direct them so far but do so in a different way than I would with a human. |
|
@dantee-e : The first thing I did, was to port your architecture onto the current master branch (version 2.2.1). There were some conflicts, so I could not just rebase it easily. Thus, I smushed it all together in one commit on top of the master branch. Then I added the German language support on top of it, together with a modular setup, so that it is extensible to support more languages. I have not reviewed it yet, but just that you know that I am trying to make it work. https://github.com/Dronakurl/harper/tree/language-support-new |
|
@Dronakurl Nice! I just remembered it now, if possible, try working on top of the merge-main branch. I was stupid and forgot to merge it back into the portuguese branch, so its quite a bit ahead of the others |
Description
Adding support to the portuguese language to Harper. It is not even close to being operational.
For now, added only superficial alterations:
Demo
Nothing yet
How Has This Been Tested?
No tests yet
Checklist
If any maintainer sees this, I apologize for not keeping the commit history clean using the conventional commits practices. If there's a way to correct this besides squashing them all into one commit when merging, let me know. I make many commits just as a tiny checkpoint, so I don't usually have the patience to make it nice.