Help anyone?
This is an AI summary of what ive been doing on chat gpt and of my drive. Is ai hyping me up is anything worth it haha ive been learning python and doing this from my phone 80% this is just a summary. Idk if its hype if its bs if its whatever help please hoesntly. Idc if its bad teach me why or well just say where I can go to find the right way
Direct assessment: yes, what you are doing is good, but the impressive part is not “I built a finished pharmacy app.” The accurate claim is narrower:
You are building a phone-first Python/Tkinter pharmacy training shell with a real audit discipline: baseline control, evidence labels, one-patch-at-a-time rules, automated contract tests, stress loops, and manual Android/Pydroid phone testing. The current experimental branch is explicitly not promoted, has no active pharmacy logic, and still requires phone testing before promotion. That honesty is part of why the workflow is better than normal beginner “vibe coding.”
Here is the Reddit-style version:
Title: I started learning Python recently and ended up building a phone-first Tkinter app with a stress-testing workflow. Is this actually a good direction, or am I overrating it?
I’m trying to evaluate this realistically, not turn it into a fake “I built the next big thing” post.
I started from basically hobby-level Python/Tkinter work. The earlier project was a gacha/RPG-style app where I was mostly learning by breaking things, testing on my phone, and finding UI/runtime bugs that did not show up cleanly from just reading the code. That project taught me that the app “working once” is not the same thing as the app being stable.
The workflow evolved into something more disciplined:
- phone-first testing on Android/Pydroid instead of assuming desktop behavior is enough
- one patch at a time
- one active baseline
- known bugs carried forward instead of forgotten
- no stability claim unless there is evidence
- no promotion until both automated tests and phone behavior make sense
- AI can generate/audit code, but I still have to test, reject, reproduce, and decide what actually counts as working
The current project is a MODPY Pharmacy Shell, not a finished pharmacy app. It is a Python/Tkinter shell aimed at phone use. Right now it is intentionally conservative: no database, no live clinical/law content, no medical calculators, and no drug logic active. The point is to build the structure, navigation, safety labels, manifests, and test harness before adding risky content.
The current experimental package is:
"modpy_pharmacy_shell_exp_v0_1_2.zip"
It is an experimental branch, not a promoted release. Automated tests pass, but phone testing is still required before promotion.
The latest run passed:
python3 TEST_THIS.py
Ran 30 tests OK
stress_contracts passed: loops=100000 seed=20260520 deep=False
compileall passed
unittest passed
stress_contracts passed
tools/run_stress_matrix.py --seeds 1 --loops 1000 --timeout 20 passed
tools/run_max_attack.py --seeds 1 --loops 1000 --timeout 20 passed
zip entries: 27
no __pycache__
no .pyc
no stress_artifacts packaged inside zip
One part I think is actually meaningful is that I’m not just adding features. I’m adding contracts that stop bad changes from silently getting in.
Example from the stress harness:
REQUIRED_PAGES = {
"home", "calculators", "quiz", "law", "workflow", "admin",
"ideas", "writer_os", "publishing", "ai_audit", "sources",
"stress", "settings", "safety"
}
REQUIRED_CARDS = {
"Calculators", "Quiz Lab", "Law Reference", "Workflow Tools",
"Admin", "Idea Lab", "Writer OS", "Publishing Lab", "AI Audit",
"Source Registry", "Stress Lab", "Settings"
}
FORBIDDEN = [
"sqlite3", "CREATE TABLE", "INSERT INTO", "amoxicillin",
"warfarin", "metformin", "oxycodone", "hydrocodone",
"dea schedule"
]
That means the current shell is tested to make sure required pages/cards exist, calculators stay disabled, and no clinical/drug/database content leaks into the shell before the app is ready for that.
There is also a static validator in the app itself:
def validate_static_contracts() -> list[str]:
issues: list[str] = []
page_keys = set(PAGE_DATA)
card_keys = [item["key"] for item in CARD_DATA]
nav_keys = [key for key, _label in NAV_ITEMS]
card_titles = [item["title"] for item in CARD_DATA]
if len(card_keys) != len(set(card_keys)):
issues.append("duplicate card keys")
if len(card_titles) != len(set(card_titles)):
issues.append("duplicate card titles")
missing_card_targets = sorted(set(card_keys) - page_keys)
if missing_card_targets:
issues.append("card targets missing pages: " + ", ".join(missing_card_targets))
missing_nav_targets = sorted(set(nav_keys) - page_keys)
if missing_nav_targets:
issues.append("nav targets missing pages: " + ", ".join(missing_nav_targets))
if PAGE_DATA["calculators"].get("status") != "Disabled":
issues.append("calculators page is not disabled")
if "not a substitute for a pharmacist" not in PAGE_DATA["safety"].get("body", ""):
issues.append("safety page disclaimer missing")
return issues
And the “max attack” runner is basically a local audit chain:
py_compile.compile(str(p), doraise=True)
run([sys.executable, "-m", "unittest", "discover", "-s", "tests", "-v"])
run([sys.executable, "tools/run_warning_sweep.py"])
run([
sys.executable,
"tools/run_stress_matrix.py",
"--seeds", str(args.seeds),
"--loops", str(args.loops),
"--timeout", str(args.timeout),
])
run([sys.executable, "tools/stress_contracts.py", "--loops", "100000"])
print("MAX ATTACK PASSED")
A lot of the actual bug catching is still manual. I test on my phone because Tkinter behavior on Android/Pydroid can differ from desktop assumptions. Some issues are not really “logic bugs”; they are runtime/UI behavior bugs, like scroll behavior, layout density, button behavior, disappearing panels, navigation quirks, or whether the UI actually feels usable on a phone screen. That part cannot be fully replaced by automated tests.
What I personally solved or directed:
- I moved away from giant feature dumps toward one-patch-at-a-time work.
- I started treating phone behavior as a real source of truth instead of just trusting desktop tests.
- I separated “reference code” from the actual baseline so old files do not contaminate the current project.
- I pushed for manifests, version consistency, evidence levels, known-bug lists, and promotion gates.
- I caught UI/runtime problems through actual phone testing.
- I stopped promoting builds just because tests passed.
- I started using stress tests and replayable seed-style testing instead of just clicking around randomly.
- I kept clinical/pharmacy content locked out until the shell has proper source and validation structure.
The honest limitation: this is not advanced because the app itself is complex yet. It is still mostly a shell. The current stress tests mostly attack static contracts, navigation rules, metadata/version consistency, and contamination prevention. They are not proving that pharmacy calculations are correct because pharmacy calculations are not active yet.
The part that might be above beginner level is the process:
- baseline discipline
- test harnesses
- stress loops
- manual device testing
- evidence labeling
- refusal to claim stability without proof
- documentation and manifest control
- keeping unsafe content disabled until validation exists
So my question is: for someone still early in Python, is this a good development path? Not “is this production-ready,” because it obviously is not. More specifically: is building a phone-tested shell with automated contracts, stress testing, and strict promotion rules a legitimately strong way to learn software development, or am I overbuilding process before I have enough actual app logic?For screenshots, the best ones are the test output, validate_static_contracts(), and run_max_attack.py. Those are more defensible than screenshots of UI alone because they show the process: contracts, restrictions, repeatable checks, and explicit limits.