[ home ] [ site / arch ] [ pony / oat / anon ] [ rp / art ]

/fic/ - Fanfiction

The board for fanfiction review, brainstorming, critique, creation and discussion.
Name
Email
Subject
Comment
File
Flags
Password (For file deletion.)

Site maintenance in progress! Posts made now may be lost.

Ponychan-MLPchan Merger >>>/site/15219

File: 1357679556509.png (272.09 KB, 768x673, 1DrawitagainPonE.png)

Project: Autoreview 3253

>Copied from Ponychan
Greetings all. I'm coming by to drop off a tool that hopefully you will be able to use.

http://auto-reviewer.appspot.com

Using it is straightforward. Copy the story text into the box and it will give you some statistics, check for a few common problems, and give wordcounts. If it flags any common problems, there are links to explanations given with each line.

This is an attempt to automate feedback for stories. It won't replace the feedback of a full-fledged review, but hopefully will help writers get an initial run of feedback on their stories, help them learn to spot and avoid common mistakes and reduce the work involved for reviewers.

The program isn't perfect. If your story comes out clean of errors, this doesn't mean that it is entirely free of any errors. Similarly, it won't flag every error. But it should get a large number of the most commonly occurring ones.

Right now, it checks capitalization at the beginning of sentences, checks for rarely capitalized words, and checks dialogue punctuation.

If you have suggestions for how I can improve the program, eg features to implement, how the interface can be improved, bugs you encounter, etc, any feedback would be greatly appreciated. If you would prefer to communicate your suggestions by email, you can use either the email in my trip or the dedicated email address for the project reviewsuggestions at gmail.

I'll continue developing the program and if I make significant changes, I'll try to keep the thread updated. Feel free to ask questions and I'll do my best to come by and answer them.

Thanks and enjoy.

3256

If someone wants to audit the code, I found it here: https://github.com/reviewsuggestions/autoreview

There's no licence included, though, so you should email the author first if you want to fork it.
This post was edited by its author on .

3356

I don't think the information here needs to be particularly different from elsewhere, so I'm copying the message from the ponychan thread without the snarky introduction. So, if you've read that one, you don't have to read this.

As you may have noticed, the application has been updated. Per my discussion with Samurai Anon thanks for your input and suggestions, btw, the background is now something approximating parchment. The review itself is now formatted differently, with each type of error spread out. Hopefully, this helps with the readability he mentioned could use improvement. The explanation section has been revamped, so it is no longer written in my best approximation of Legalese. The review output is customisable with respect to what choices you want to use.

I left the old options accessible, but that will disappear unless someone has a compelling reason for the old things to stick around.

Any suggestions regarding the new update would be greatly appreciated. I freely admit that I have absolutely no idea about graphic design, and so I'm probably committing some egregious error with my visual output and any suggestions on how to fix it would be very helpful. Also, I didn't see any when I tested it a couple times, but if anyone is getting errors, let me know and I'll try to fix it as quickly as possible.

As for what most of you don't see, here's what's going on with the code:
The main engine for the code has been completely rewritten, this time with actual documentation, so hopefully this version is readable. The link for the code is the same as the old one, https://github.com/reviewsuggestions/autoreview Thanks to Roger for posting the link in the other threadalthough I apologise to anyone who tried to look at the old thing. It's hideous!

I've also been working on getting functions and classes defined for sentence processing, which should allow the program to recognise a lot more things: short term, LUS, verb tense and agreement; later (possibly) missing/unnecessary commas. The classes are in dictclasses.py.


As a final aside that may be universally relevant, I put a module that allows the program to be used offline.
For those of you using UNIX, you put the text you want to use in a plaintext file in the same directory as the code. Then navigate to the appropriate location in terminal. For a default review, run the program with the name of your text file as an argument: ie if your text is named [code]story.txt[/code], you would type in your terminal [code]python offlineautoreview.py story.txt[/code] and the review will end up in review.html in the same directory. To change options, you put -a after the name of your story file and follow the prompts, so you would type [code]python offlineautoreview.py story.txt -a[/code] in terminal. The review will end up in the same place.

I have a hackneyed method that works for windows if you don't have python installed and in the names file, but someone else can probably give a more elegant method.
Anyway, for windows, get a version of python 2.x and put all the files in the same directory as your python files, and then using command line, navigate to that directory. Then type the same thing as above, ie [code]python offlineautoreview.py story.txt[/code] if your story is named "story.txt" for a default review, and [code]python offlineautoreview.py story.txt -a[/code] for an advanced review.


Finally, actually following the advice of Samurai, I'd like to recruit those interested in helping. For one, I have decided weaknesses in both technical and non-technical aspects necessary for continued development, so getting others on board would help mitigate those weaknesses. Also, I'm not a programmer and while I do want to see this project grow a lot more, it is ranked behind "research" in my free time activities. if research sounds vague and expansive to you, then you are right. As a result, my development progress is very slow and often stops for long periods of time. Having others on board would both keep me accountable to keep working and allow progress even when I decide to stop working on the project for a while.

Anyone is welcome to assist, and considering that I took something like seven months just to update this, I'm not going to be a dick about any sort of deadlines. Basically, as long as you don't actively sabotage the project, you're fine.

If you want to help:

For those with programming abilities/experience:
Obviously, people interested in doing some of the coding would be very helpful. Both working on my ideas and people expanding the code base with ideas of their own. Also, even though there was a rewrite, any suggestions on how the current implementation can be improved or rewritten would be very useful.

Also, even if you don't want to write code for the project, input on how I can better collaborate would be greatly appreciated. I haven't worked a group coding project, so I'm sure I'm quite lacking in several areas. I tried to document the code well, but it wouldn't surprise me if there was some obvious thing that I forgot or didn't think to include in my documentation, so comments and criticism about how to make documentation clearer would be very helpful. Also, it probably rather obvious, but I have no idea what I'm doing with github beyond putting the code up there, so if anyone has suggestions for how I can make the stuff that is up there more accessible or better organised or improve the repository in any way, that would be very helpful.

Finally, I'm relying on security by obscurity to protect against attacks right now. For obvious reasons, I can't discuss details of this here in the open, but if you would like to try to address this, any assistance would be appreciated.


For those without or not wanting to use programming abilities/experience:
As I mentioned above, I have absolutely no ability as a graphic designer, so any advice on how the site/reviews/explanations should look would be excellent. Also, suggestions on functionality or changes in wording, etc would be very helpful as well.

The next part of the project involves creating a dictionary to allow the program to recognise part of speech, conjugations, and other aspects of each word. In order to do this, I'll need to create a dictionary, which in order to be very useful, will have to be fairly large. Writing all of this, even with some automated shortcuts, would be a very long process for one person and it is also inevitable that I would make mistakes. If people would be willing to write some of the entries and/or check the entries for errors I would create a form/template that would make it simple that would speed up development and undoubtedly, keep me interested in the project a lot longer.


If you want to contact me about helping, the project, or for any other reason, you can use either the email in my trip or the one from the app (reviewsuggestions(at)gmail). They both go to the same place, so it doesn't matter which.

Thanks for reading and apologies to anyone who has been waiting for an update on this thing.

tl;dr: App updated. Opinions?
Want to help? Let me know.

Small Changes Post 3362

I caught one silly error popping up, so that should be fixed now. I'm not sure what error message a user would see on the other end, but if you were getting an error, possibly saying something about 'were' Traceback, that should be fixed now.

I'll use this post to update if I make bug fixes or small changes. I'll try to check fairly often so I can fix things when they pop up.
This post was edited by its author on .

3366

>>3356
Heh. A couple days ago I actually played around with something similar with the intention of making it a CPAN module.

Anyway, from a programming point of view, your code is very, er … difficult to read. You need to be way more liberal with your whitespace. Just for example, https://github.com/reviewsuggestions/autoreview/blob/4b4c72dd50ed209adda866865989347c02bee793/errorlisting.py#L153 is a god awful mess. What you ought to do is have a new line at the end of every argument to Error() and indent appropriately. Then it'd look something like this:

[
Error(
. . .
. . .
. . .
),
Error(
. . .
. . .
. . .
)
]

Which I'm sure anyone would agree is far more legible.

Don't embed dictionaries into code. Instead, have a dict/ directory containing all of your dictionaries with entries separated by newlines. Parse appropriately at run time. This keeps the application's logic separated from its data, and also makes the dictionaries easier to find, read, and edit.

Lastly, I think there's an inherent flaw in making all of your rules depend on regular expressions. While they're powerful, not all languages are regular, and English most certainly isn't. The way I set it up in my module is explained in the POD: https://github.com/RogerDodger/Lingua-EN-AutoReview/blob/master/lib/Lingua/EN/AutoReview.pm#L247 . This gives you the flexibility to use all kinds of parsing algorithms that aren't necessarily regexen.

It probably results in code that's a big longer to write, but it makes it a easier to maintain, I think, and also makes output easy. (Not suggesting that you do it that way, but just showing you how another person approached it.)

All you'd really need to do is make it so your Error() constructor (or whatever it is) can take either a regular expression or a code reference (or whatever python calls closures) for doing the actual error checking.
This post was edited by its author on .

3368

Updated with real fixes:
I pushed the code to the github repository. I tried spacing out most of the multiple line things I saw. Hopefully, they look better now. If there's formatting that still needs improvement to make the code look better, let me know and I'll get it fixed as soon as I can

The function that handles error checking with is updated so the error class can take arbitrary functions and lists/tuples of functions, rather than just regex expressions and tuples/lists of regex expressions. Right now it expects the functions to take a string and return a list of strings.

Dictionaries are still in there as text, but will be removed once the dictionary class gets finished.

Old stuff hidden because it is less relevant now:
Thanks for the advice regarding formatting. I'll try to make it more spaced out as soon as I can. I know I've been avoiding wrapping lines, so I'll fix that. Is vertical whitespace the only issue, or do I need more horizontal whitespace as well?

As for the dictionaries, I'm planning on making them external with the new classes, so I'll probably leave them in code temporarily until the new dictionary system is created and then move them over there, unless there is a strong reason to move them out right away.

So, the Error class is limited to regular expressions for finding the particular text to check, but allows arbitrary functions for determining whether or not the text is in error. Is this still an issue? Actually, this may be moot, as I think I have a fairly simple way to update the Error class to work.


Thanks again for the input and if you have any other suggestions, I'd love to hear them.
This post was edited by its author on .

3371

>>3368
>If you have any other suggestions, I'd love to hear them.
Hm, well, first of all, you should stop using `[a-zA-Z]` in your regexen. That's very bad! You should use the `\w` character class instead, which matches all word characters, not just those in ASCII.

Similarly `[a-z]` is not a substitute for the proper `\p{Lowercase}` character class, nor is `['"]` a substitute for `\p{Quotation_Mark}`. Familiarise yourself with unicode character classes and use them appropriately.

Ack! I see in processing.py you are attempting to transform unicode into ASCII. That's the wrong way of going about things. Instead of mogrifying your text to work with your regular expressions, you should make the regular expression match what it actually should be matching. See the 52 dot points under "Assume Brokeness" in the top answer to http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default - some of it is specific to Perl, but the important bits still apply to you, namely:

>Code that assumes that ASCII is good enough for writing English properly is stupid, shortsighted, illiterate, broken, evil, and wrong. Off with their heads! If that seems too extreme, we can compromise: henceforth they may type only with their big toe from one foot (the rest still be ducktaped).


:p

3384

>>3371
I hate asking these questions. They make it seem like I'm ignoring the advice.
So I read through the article and the description by tchrist, and it seems to be mostly a description of why proper unicode compatibility is so difficult, and then a rant/list of things not to do to create that compatibility. I know that it said that there wasn't space to justify each of the bullet points, but I have to ask: why is unicode compatibility important?

The obvious reasons—greater character diversity for publishing, using multiple languages, developing universal code to be used in arbitrary modules—don't particularly apply to my application and as it handles unicode now, the worst case is it drops a few characters and logs the identifier of the dropped characters until I can get around to plugging in a suitable replacement.

I chose to convert to ASCII because python in general, not just the regex, seemed quite unhappy when I tried to get it to handle unicode. I seem to remember even simple print statements throwing up fatal errors when they tried to use unicode. Granted, this was a long time ago, so I might have just been being particularly stupid.

I admit that I don't know that much about typical conventions, so it is probable that I'm missing something significant. Do you have any links that explain why using unicode is particularly important or can you fill in the gaps in my reasoning? As tchrist's response makes clear, there are a ton of things necessary to make an application compatible with unicode, so I can't imagine that retrofitting the application would require less work than a complete rewrite. Granted, the rewrite itself only took about a week of working, but it's still an inconvenience.
I understand the importance of best practice, but I don't think blindly following best practice warrants such a significant overhaul, unless the reasons for said best practice actually apply here.

tl;dr: Not throwing out the idea, but lazy, and would like justification before trying to retrofit everything

3400

>>3384
Well, just for one example, you're converting all the unicode dashes into hyphens. How do you check that people are using proper dash characters, then? You can't go letting those evil double-hyphen'ers off the hook.

Secondarily, your asciiconvert function is not even near bulletproof. What if someone puts some accented text in your analysis, or some text you simply weren't prepared for? If the problem was Python spitting errors, well it's going to do that anyway, unless you merge every bit of Unicode you don't recognise into ?'s. But if you do that, then the analysis is obviously going to be off.

Looking into it, it seems that python's `re` library does not let you match by Unicode properties such as `\p{Lowercase}`, though googling around gave many libraries that do play nicely with Unicode. Unfortunately I'm not expert on Python, so the exact implementation of these things you'll have to look for yourself. There is certainly a solution that works, though. And when you find it, it'll be much more readable and "correct" than a function that reads like a half-handed codepage.

Even if you think this is unimportant for this app (which it isn't, but if you aren't convinced), knowledge of how Unicode works in your language of choice is important.
This post was edited by its author on .


Delete Post [ ]
Edit Post
[ home ] [ site / arch ] [ pony / oat / anon ] [ rp / art ]